Soft Sensor for Snack Food Textural Properties Using On-Line

The digitized time-domain signals are transformed to the power spectrum in real time by FFT on a frame-by-frame basis. Each frame contains 2048 data p...
0 downloads 0 Views 318KB Size
864

Ind. Eng. Chem. Res. 2007, 46, 864-870

Soft Sensor for Snack Food Textural Properties Using On-Line Vibrational Measurements Mark-John Bruwer,† John. F. MacGregor,*,† and Wilfred M. Bourg, Jr.‡ Department of Chemical Engineering, McMaster UniVersity, 1280 Main Street West, Hamilton, Ontario, Canada L8S 4L7, and Frito-Lay, Inc., 4219 CR 277, Melissa, Texas 75454

It is common in the food industry to produce solid-phase products that are not amenable to ready characterization on the production line via traditional on-line instrumentation. This motivates research into novel sensor technologies. In this work, an on-line vibrational sensor is used to develop a soft sensor for real-time prediction of product texture in a commercial snack food process. This in turn enables real-time multivariate statistical process control (MSPC) and indicates the opportunity for automated feedback control. An accelerometer is used to record the vibrational signature generated by a snack food product falling onto a metal surface. The frequency distribution of the acoustic signature is obtained via the discrete Fourier transform. This provides a predictor space from which the textural properties are modeled using partial least-squares (PLS). Excellent results are obtained, with R2 upward of 96% on an independent test set for two properties that fully span the texture space of this product. 1. Introduction The purpose of this work was to develop an on-line passive vibrational sensor to provide real-time prediction of the textural properties of a commercial snack food produced by Frito-Lay, Inc. These properties were obtained using three different types of measurements: (a) expert sensory panel assessment (five properties), (b) visual assessment of blister level, and (c) a mechanical product-breaking instrument that gives force versus distance data. It is shown in a companion paper1 that this diverse set of textural property measurements can be fused using principal components analysis (PCA) into two key pseudotexture properties, “blister level” and “brittleness”, that can predict approximately 90% of the variation in the larger set of measurements. On-line prediction of these two pseudoproperties would enable real-time multivariate statistical process control (MSPC)2 as well as open the possibility for automated feedback control of the product quality.3-5 It has long been recognized that vibrational signals from a process can contain valuable information on that process. The information can be categorized into (a) equipment condition, (b) process condition, and (c) product condition. An example in the first category is fault detection for mechanical equipment, such as demonstrated by Tan et al.6 on a washing machine case study and Ocak and Loparo7 for monitoring bearings. Minion et al.8 provide an example for the second category; the vibrations from a liquid steel ladle were measured to estimate the degree of stirring achieved by sparged gas. Esbensen et al.9 provide another example in that they used vibrations from pneumatic conveying of granular materials to estimate the mass flow. A third example, at laboratory scale, is provided by Belchamber et al.,10 in which they tracked a batch process (the hydration of silica gel) through acoustic emission. The third category represents perhaps the most ambitious use of vibrational signals from a process, property prediction of the final product. This is akin to “soft sensor” technology (cf., Kresta et al.11), the key being that vibrational signals rather than traditional process * To whom correspondence should be addressed. Tel.: +1 (905) 525-9140. Fax: +1 (905) 521-1350. E-mail: [email protected]. † McMaster University. ‡ Frito-Lay Inc.

measurements (pressure, temperature, flow, etc.) provide the input variables from which product properties are predicted. For example, Halstensen and Esbensen12 used the vibrations generated by powder particles falling onto a surface to estimate the particle size distribution. Their results were based on a laboratory experiment, with the ultimate goal to implement the technology in an industrial process. Similarly, Bjo¨rk and Danielsson13,14 used vibrations from a wood pulp conveying line to predict key product properties. The focus of this research is on vibrational signals acquired “passively” from the process using accelerometers and/or microphones. This is in contrast to “active” methods, such as ultrasound, in which high-frequency sound is input to the process and the transmitted signal measured and analyzed. The simplicity and cost effectiveness of passive methods favors them for ready implementation in industry. Nevertheless, it is essential to carefully choose the location of the vibration measurement. Moreover, slight process modifications may be warranted to enhance the information content of the vibrational signal. For example, Esbensen et al.15 present an interesting innovation in which they place an orifice plate constriction in a flow line for this reason. In many cases the frequency content of the vibrational signature produced by a process or product can be considered to be stationary over the (subsecond) time window of data acquisition. In this case the common analysis method is the fast Fourier transform, FFT.9,16 Bjo¨rk and Danielsson14 present an alternative approach in which a multiresolution wavelet decomposition17 is performed on the time-domain signal, followed by FFT of the wavelet coefficients at each wavelet scale. They demonstrate in their wood pulp case study that this produces superior results to simply applying FFT to the original timedomain signal. A prediction model is built from the frequency-domain features to the product or process characteristics of interest. Since there are a large number of highly correlated frequencydomain features, a latent variable method such as partial leastsquares (PLS) is appropriate for fitting models.9,14,18 Appendix A provides a brief tutorial on latent variable methods, with the focus on PCA and PLS, as well as suggested references for further reading. In this work, the relationship between X and Y

10.1021/ie060832r CCC: $37.00 © 2007 American Chemical Society Published on Web 12/23/2006

Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007 865

was found to be sufficiently linear that excellent results were obtained with linear PLS. Therefore, nonlinear methods, such as nonlinear PLS,19-21 were not needed. 2. Materials and Methods The vibrational signals generated by the product at key points in a pilot plant of Frito-Lay were recorded during a designed experiment. These experiments were implemented to span the full multivariate texture space, which was essentially captured by the pseudoproperties “blister level” and “brittleness”. This section details the process description, designed experiment implemented, vibrational measurement equipment used, and the signal processing and analysis methods employed. 2.1. Process Description. Raw corn is cooked in water, allowed to soak for several hours, then washed to remove the loose hulls. Thereafter, it is drained and then milled to produce a dough. This mill comprises two parallel stone plates, one fixed, the other rotated at high speed. The particle size distribution of the dough is manipulated by altering the gap between the plates. Adjustments can only be made when the mill is not operating. The dough empties into a surge hopper. Water can be added as a continuous stream at this point to adjust the bulk moisture fraction to the desired level. Thereafter, the dough is pressed into a sheet of the required thickness. The product bases are cut out of this sheet via an automated cutter. The bases travel through a two-level gas-heated oven via a system of conveyor belts. The air temperatures in various zones of the oven are adjustable. Thereafter, the bases enter an oil deep fryer and travel through it cocurrent with the oil. They are extracted by another conveyor belt. The bases travel for some distance on a series of conveyor belts and a bucket elevator. This provides time for them to cool and be drained of excess oil. An on-line near-infrared camera is positioned at the end of this line to provide real-time prediction of fat and moisture content. The bases fall off the end of the conveyor belt onto a stainless steel spiral chute. The chute empties into a rotating drum that is fitted with an oil spray boom at the front end and seasoning spray boom at the back. The drum is inclined slightly in a downhill manner to ensure that the bases travel forward through it. The oil spray wets the surface of the bases to enable the seasoning to adhere when it is sprayed on. The finished product exits the drum onto another conveyor system that transports it to the packing line. There it is packed into small heat-sealed bags for preservation via oxygen and moisture barriers. The bagged product is manually packed into boxes. 2.2. Experimental Design. The purpose of the experimental design was to produce product at different process conditions so as to span the entire space of textural properties likely to be encountered in commercial production. Five inputs to the process (factors) were identified that have a significant effect on texture, namely, the mill gap (affecting mean particle size in the dough), dough moisture fraction (altered by manipulating the flow of water into the dough hopper), product thickness (quantified by the average weight of 10 bases), fryer oil temperature, and the temperature profile in the oven. These process inputs are known to affect product texture via various mechanisms. Hence all five were included in a 25-1 fractional factorial design.22 The process was allowed to reach steady state at each condition, and then three sets of samples were packaged, at the start, middle, and end of each steady-state period (typically 5-10 min). In addition, when possible a sample was packaged during the transition from one

Figure 1. Vibrational measurement equipment setup in the pilot plant: (a) accelerometers mounted on the underside of the stainless steel chute; (b) hydrophone mounted off the oil spray boom inside the rotating drum.

steady state to the next. Time limitations on the pilot plant only allowed 12 of the 16 experimental conditions to be implemented. [Since the 16 conditions were randomized before implementation, the final 4 conditions, that could not be implemented due to the unforeseen time limitations, form a random subset of the original 16.] The net result was a set of 41 packaged samples that spanned the space of expected textural properties. [The 41 samples comprised 36 from the 12 steady-state conditions (3 per steady state), plus an additional 5 samples collected during transitions between steady states.] From this point forward, each such sample will be termed an “experimental sample”. 2.3. Equipment. Two types of vibrational sensors were tested in this research, namely, accelerometers and hydrophones. Two accelerometers were mounted on the underside of the stainless steel chute that feeds the seasoning drum. The product creates a vibrational signature as pieces of product fall from an overhead conveyor belt onto this chute. Mounting studs were welded to the chute and the accelerometers screwed into these, thus providing a stiff connection for transmission of the vibration signal. [In some processes it may be more appropriate to use a mounting clip that is secured with an epoxy adhesive.] In addition, the cables were taped down to prevent the generation of spurious signals via cable oscillations. The accelerometers used were Dytran 3184E (0-5 kHz range, sensitivity 100 mV/ g), and 3019A1 (0-20 kHz, 10 mV/g). A hydrophone was mounted off the oil spray boom in the rotating drum for product seasoning. This hydrophone records the continuous sound of the product cascading on itself inside the drum. The hydrophone used was the ST191-BNC from Cetacean Research Technology (0-20 kHz, sensitivity not specified). Figure 1 provides photographs and schematics of the vibrational measurement equipment setup. 2.4. Signal Processing. The vibrational sensors used in this research cover the audible range (0-20 kHz). There certainly may be valuable information in the frequency range above this; however, the higher the frequencies recorded the greater the demands on the data acquisition and processing system (with associated increase in cost). Furthermore, in this application it was known that the operators could intuitively detect changes in product texture by listening to the acoustic signature it creates. A digital signal processor (DSP) was purchased from Dactron, namely, the Photon-100-04. This system can acquire four

866

Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007

channels synchronously up to a maximum frequency of 21 kHz. All calculations are performed with 32-bit precision. The unit is small and rugged and can be connected to a laptop via the USB port, thus providing a portable solution for feasibility studies. Complementary software on the laptop allows real-time display and archiving of the vibrational data. The DSP performs 18-bit analog-to-digital (AD) conversion and applies analog and digital filters to prevent aliasing and phase distortion. The analog signal is sampled at 48 kHz. The digitized time-domain signals are transformed to the power spectrum in real time by FFT on a frame-by-frame basis. Each frame contains 2048 data points and lasts 42.667 ms in the time domain. Thus, there are approximately 23 power spectra generated per second per channel. The first 900 discrete frequencies (0-21.1 kHz, 23.44 Hz resolution) of each power spectrum are transferred to the laptop for further processing and archiving. The DSP also allows the application of a time-domain windowing function to reduce spurious harmonics in the frequency-domain power spectrum due to the discontinuity that results from a finite-length data window. Examples include the Bartlett and Hamming window functions. These functions also serve to smooth the power spectrum.16 The particular windowing function chosen was the Hanning window for a balance of good reduction in spectral leakage and reasonable amplitude accuracy.23 The Fourier transform is one of the few mathematical transforms that does not show a reduced error variance when the number of data points is increased. Instead, the extra data are used to increase the frequency resolution. However, in practice a lower error variance for each of a set number of discrete frequencies may be preferable. This can be achieved by dividing the time-domain data into contiguous frames, performing FFT on each frame, and then averaging the calculated power for each discrete frequency over all the frames. In order to perform such averaging on-line, a common method is an exponentially weighted moving average (EWMA) filter. This is the method that was selected in the Dactron software, RT Pro. The weighting parameter was chosen to give an equivalent continuous-time first-order filter time constant of approximately 3 s. This gives a suitable balance between sufficient averaging on the one hand and responsiveness relative to dominant process dynamics on the other, the latter being on the order of minutes. 2.5. Analysis Methods. During experimentation, nominal time delays between key locations in the pilot plant process were determined by measuring the mean transit time of a set of dyed corn chips. In particular, it was found that the mean time delay between the location of (a) the accelerometers and hydrophone (all close together in the process) and (b) the machine where samples were packaged, was around 7.5 min. The variability around this nominal value is probably of the order of (1 min. Thus, the uncertainty in the time stamp of the vibrational power spectrum that corresponds to a particular packaged sample is significant. For this reason it was decided that for the inferential sensor modeling, only those samples that correspond to a significantly long process steady state (say 15 min or more) would be used. A set of approximately 30 contiguous power spectra obtained using the previously described windowing and averaging were chosen for each steady-state sample, based on a time-stamp range centered on the nominal time delay (7.5 min). This equates to a time range of around 5 min since the power spectra are 10 s apart. In addition, the real-time fat and moisture readings from

the on-line near-infrared sensor (located immediately upstream of the accelerometers) were used as an indicator to eliminate time periods when the process was clearly not at steady state. A separate PCA model was built on the set of 30 power spectra collected for each experimental sample. [These power spectra were arranged in the data matrix, X, so that each row is a unique power spectrum, while each column is a unique discrete frequency. Each column of X was then mean centered and scaled to unit variance.] The first three latent variables (which typically predicted 40-50% of the variance under cross-validation) were analyzed jointly. Observations (i.e., the corresponding power spectra) were eliminated if (a) they formed part of a structured trend as a function of time in the score plot, (b) the score values were significantly outside the Hotellings T2 95% confidence ellipsoid, (c) the distance to the model (squared prediction error) for that observation was significantly above the 95% upper confidence limit (see Kourti24 or Eriksson et al.25 for the statistical definitions). This approach eliminated any abnormal power spectra (outliers). The power spectra were averaged for each experimental sample, transformed via base 10 logarithm, then mean centered and scaled to unit variance. [Mean centered so that the predictor variables become the deviations around the mean spectrum.] The two pseudotexture Y-variables were also mean centered and scaled to unit variance. 3. Results and Discussion In the end, only the vibrational data from the 0-5 kHz accelerometer were used. Although the 0-20 kHz accelerometer had a greater frequency range, its sensitivity was 10 times less than that of the 0-5 kHz accelerometer. This proved to be too insensitive for this application. Furthermore, although the 0-5 kHz accelerometer experiences attenuation above 5 kHz, it was still possible to obtain meaningful information up to 20 kHz due to the greater sensitivity of this sensor. The hydrophone was expected to give an excellent acoustic signal since it was mounted inside the seasoning drum (thus largely shielded from background noise) and was recording the continuous signal of the product cascading on itself. However, this sensor showed significant drift through the course of the experiments. The reason is believed to be due to the oil mist slowly permeating and swelling the polymer casing surrounding the hydrophone. Thus, while this location may still be a good one, a hydrophone-like instrument is needed whose casing is impermeable to oil. The results presented in this section pertain to the inferential sensor developed using the 0-5 kHz accelerometer. 3.1. Texture Space Data. The companion paper1 demonstrates the fusion of multiple texture measurements into a single principal component analysis (PCA) model.26 Figure 2 shows the score plot of this model with the two significant latent variables renamed as “blister level” and “brittleness” (for reasons explained in the companion paper). In addition to the removal of a few outliers during the texture space modeling phase, a few more observations were eliminated for the inferential sensor modeling since they corresponded to samples collected during process transients. Only those samples retained for the inferential sensor modeling are included in the figure. The encircled observations were selected as the test set for the inferential sensor model. These were chosen pseudorandomly in that the only criterion used was that they adequately span the texture space in both the “blister level” and “brittleness” directions. 3.2. Vibrational Signatures. Figure 3 shows a set of sample vibrational signatures (power spectra) from the 0-5 kHz

Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007 867

Figure 2. Texture space model, showing observations included in inferential sensor modeling. Encircled observations were selected as the test set.

Figure 3. Selected power spectra from the 0-5 kHz accelerometer. Each spectrum is the mean of all spectra corresponding to that experimental sample. Note: the spectra here and in Figures 4 and 7 are shown log transformed but without further processing. Prior to PLS modeling, each discrete frequency in the spectrum was, in addition, mean centeredsgiving deviations around the mean spectrumsand scaled to unit variance.

passing through (the ordinate follows a logarithmic scale). Furthermore, this phenomenon was observed both when the background noise was measured before and after the main designed experiment (i.e., about 12 h apart). This situation is ideal since it implies that variations in the background noise signature will have minimal influence on the robustness of the inferential sensor models. 3.3. Inferential Sensor Models. The following preprocessing leads to the best PLS model (in terms of maximum Q2 on the independent test set). The power spectra were averaged for each experimental sample, transformed via base 10 logarithm, then mean centered and scaled to unit variance. The two Y-variables were also mean centered and scaled to unit variance. The number of latent variables were selected so as to maximize, on the independent test set, Q2 for the Y-variables. A model with eight latent variables simultaneously achieved this for both Yvariables, giving Q2 values of 97.1% for blister level and 96.1% for brittleness (independent test set statistics). This model showed no significant outliers on the training set, both in terms of variation in the score space (Hotellings T 2 confidence ellipse) and squared prediction error. Furthermore, plots of the inner relation for each component did not show any sign of nonlinearity (see Eriksson et al.25 for details on these diagnostic tools). Figure 5 shows the texture data against the model predictions for both “blister level” and “brittleness”, on both the training and test data sets. It should be noted that PLS-1 and PLS-2 modeling were found to give very similar results. [PLS-2 jointly predicts all the Y-variables, whereas PLS-1 models each Y-variable separately.] Thus a PLS-2 model was selected since a single model is easier to manage on-line both in terms of predicting the Y-variables and in terms of monitoring new X-space observations (for consistency with the X-space correlation structure on which the model was trained). Eriksson et al.25 define a “variable importance in the projection” (VIP) metric for the kth X-space variable in a PLS model via the following:

x∑ A

VIPk )

(

wak2VARa)(K/VARA)

a)1

Figure 4. Selected spectra with product passing through the process (same selection as Figure 3) vs the mean spectrum of the background noise (including oil and seasoning delivery systems).

accelerometer. Each signature is the mean power spectrum over all spectra (up to 30) collected for the corresponding experimental sample. The samples were selected so as to span the texture space. Figure 4 is a repeat of Figure 3 with the background noise spectrum added. The latter was obtained by recording the vibrational signal generated with all the process equipment running (including the oil and seasoning delivery systems in the rotating drum) but without product going through the process. The background noise spectrum shown is the averaged spectrum over 5 min of data. The key feature to be observed is that, across most of the frequency range, the background noise is orders of magnitude less than the signal obtained with product

where k is the variable index and K the total number of variables, a is the latent variable index and A the total number of latent variables. wak is the latent vector weight of the X-variable k for the ath latent variable (see Appendix A). VARa is the percentage of the total variance in the Y-space (i.e., the percentage of the trace of Y TY) that is explained by the ath latent variable, while VARA is the total percentage variance explained by the PLS model. It is instructive to plot VIP for all the variables in the model, sorted in descending order. Variables whose VIP value is considerably less than 1 do not contribute significantly to the model and could potentially be pruned. Variable pruning is useful to obtain a more parsimonious, and hence typically more robust, model. Alternative variable pruning approaches may be found in the literature. For example, Westad and Martens27 present a method based on model parameter confidence intervals obtained via a jack-knife-based method. Alternatively, Hageman et al.28 present a deterministic global optimization technique termed “tabu search”. Figure 6 shows the sorted VIP plot for the 900 frequency variables in this PLS model. It is clear from the plot that none of the variables can be considered to be insignificant in the model. Thus, no variables were pruned. The corollary is that

868

Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007

Figure 5. Inferential sensor model predictions vs measured values of the textural features “blister level” and “brittleness”: (a and c) training data sets; (b and d) test data sets; the dashed line indicates the perfect model.

Figure 6. Variable influence on projection (VIP) plot. The original frequency variables have been resorted according to descending VIP values.

the information on product textural properties is distributed over the full audible frequency range in the vibrational signal. 3.4. Sources of Uncertainty. The greatest source of uncertainty during the pilot plant experiment was the time delay between product generating a vibrational signature recorded by the accelerometer and that same product sample reaching the packaging machine. This causes significant sampling uncertainty, which was overcome to a large degree by only using samples collected during steady-state process conditions. In future applications, however, it is recommended that samples be manually collected at the closest possible point in the process to the accelerometer and then manually packaged into heatsealed bags. This would eliminate most representative sampling concerns and allow for modeling short-term dynamic changes. Another significant cause of variation in the vibrational signature is changes in production rate. These can be steady state (changes to nominal production rate) or transient (small surges in localized mass flow of product from the conveyor belt onto the chute). The latter is likely superimposed on any steady-state condition. An experiment was performed to investigate the production rate effect. While the process was at steady state, 50% of the product was scooped off the conveyor system (by emptying

Figure 7. Effect of production rate on the power spectrum; gray ) full capacity, black ) half capacity.

alternate buckets in a bucket elevator section) in a continuous manner. This was maintained for over 10 min. Figure 7 shows the mean spectrum over 5 min of data (30 spectra), both at the full and 50% production rate. It is clear that an increase in production rate generally causes the power to increase across the audible frequency range. This result can be used in one of two ways. First, if a realtime measurement of production rate is available, this can be used to remove the effect by regressing the power in each discrete frequency onto the measured production rate and taking residuals. Alternatively, differencing the two power spectra shown in Figure 7 and normalizing the result to unit length provides a vector, q, oriented with the effect of production rate:

q ) (s100 - s50)./(

∑i (s100,i - s50,i)2)0.5

(2)

where s is a vector containing the elements of the power spectrum, the subscripts “100” and “50” indicate the percentage of full production rate, the operator “./” indicates element-byelement division, and the subscript i indicates the ith element

Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007 869

of s. This vector can then be used to project out the effect of production rate via the following:

xnpr ) x - qqTx

(3)

where x is a future realization of the power spectrum, that has also been mean centered, and the subscript “npr” indicates “no production rate (effect)”. All vectors are column vectors. This issue was further investigated using O-PLS. A variant on PLS, O-PLS produces a model with exactly the same predictive power but divides the X-space into a subspace orthogonal to Y and one correlated with it.29 Thus, the orthogonal subspace found by PLS can be analyzed explicitly. In particular, the angle between the vector direction for production rate, q, and the orthogonal subspace, Portho, was determined as follows. [Portho is a matrix whose column vectors comprise the X-space loading vectors found by O-PLS. These vectors form a basis for the subspace that is orthogonal to Y.] Regressing q onto Portho gives the best prediction of q based on information in the orthogonal subspace Portho. Geometrically speaking, this regression produces a vector, q|, that is the orthogonal projection of q onto Portho. [The notation q| indicates that this vector is the component of q that is parallel with the subspace spanned by Portho.] It is calculated via q| ) Portho(PorthoTPortho)-1PorthoTq. Then, from trigonometry, the angle between q and Portho is given by cos θ ) |q||/|q|. Given the length of q is 1, the length of q| was found to be 0.977, and the length of the component of q orthogonal to this (q⊥) 0.213. Finally, the corresponding angle between q and Portho is 12.27°. This indicates that q lies mostly in the subspace orthogonal to Y, but there is a small component (magnitude 0.213) correlated with Y. Hence, for small variations in production rate, the influence on the prediction of Y is small. However, if large variations occur (such as significant set point changes in the nominal production rate) then it can be expected that a more robust inferential sensor will be obtained if the production rate effect on the vibrational spectra is first projected out via one of the two methods described previously. It is reasonable to assume that, in future applications of this technology, due care will be taken to install the accelerometer so as to ensure (a) a stiff connection to the (e.g., metal) surface and (b) the cable is constrained to prevent spurious signals from cable oscillations. However, even with a properly installed sensor, the vibrational signature, for the exact same product texture and production rate, will be significantly affected by the geometry of the measurement location. For example, the distance the product falls between the conveyor belt and the metal surface will have a significant effect. Since identical geometry is nearly impossible to achieve across multiple plants, some model recalibration effort will be required, as is the case with other inferential sensors.

Acknowledgment The authors thank Frito-Lay engineers Christina Malvaiz and Adam Warren, and the operations personnel, for their assistance in the pilot plant experiments, Frito-Lay management staff Steve Bresnahan for supporting this research, and McMaster University graduate student Zheng Liu for the use of his MatLab O-PLS code. The authors also thank the McMaster Advanced Control Consortium (MACC) and the Canadian Natural Science and Engineering Research Council (NSERC) for financial support in aid of this research. A. Appendix: Latent Variable Methods Multivariate statistical modeling involves one or both of two types of data sets, a (N × K) X-matrix, or predictor space, and a (N × M) Y-matrix, or response space. In either matrix, the data are arranged so that each row corresponds to a unique observation, while each column corresponds to a unique variable. In this work K is 900 since there are 900 discrete frequencies in the power spectrum, and M is 2 since there are two Y-variables. Although K- or M-space may be high dimensional, in many situations the statistical rank of these spaces (i.e., of the corresponding matrices, X and Y) is significantly less due to a significant degree of correlation among the variables, such as among the 900 discrete frequencies in the power spectrum mentioned above. Latent variable methods exploit this by assuming there is an underlying “latent structure” that defines an (often) low-dimensional subspace in which the system under observation moves. If the system has no natural set of response variables its latent structure is sought by reduced-rank modeling of the X-space alone, for example via PCA. When response variables exist, latent variable modeling seeks to find subspaces of X and Y that are highly related to each other in some mathematical sense. For example, PLS finds the two subspaces of maximum covariance. Often, the columns of X and/or Y are mean centered before latent variable modeling so that the gross variation from the origin does not need to be modeled. Furthermore, since these modeling methods are scale dependent, a common approach is to scale each column to unit variance. Burnham et al.30 establish a common mathematical framework for the various latent variable methods. This is the basis for the mathematical formulations presented below. Furthermore, the book by Eriksson et al.25 provides a good description of PCA and PLS with numerous examples. A.1. Principal Components Analysis (PCA). The latent variables (also termed “principal components”) determined by PCA are the linear combinations (ta ) xTpa; a ) 1, ..., A) of the original variables of X that capture the largest variance directions of its column space, ordered from largest to smallest:

pa } max pTa XTXpa

4. Conclusions An inferential sensor for snack food textural properties has been successfully developed using on-line vibrational data collected during a designed experiment on a pilot plant of FritoLay, Inc. Excellent predictive power is demonstrated for the properties “blister level” and “brittleness”. The on-line implementation of this inferential sensor enables real-time prediction of these properties. This in turn provides the opportunity for real-time statistical process control (monitoring of product quality), and indicates the potential for automated feedback control.

pa

s.t.

a ) 1, ..., A

pTa pa ) 1

and

pTi pj ) 0

∀ i,j s.t. i * j

(A1)

The solution is the ordered set of eigenvectors (normalized to unit length) associated with the largest eigenvalues (λa) of the covariance matrix, XTX. In the terminology of PCA, each such latent vector is termed a “loading vector”, with each coefficient

870

Ind. Eng. Chem. Res., Vol. 46, No. 3, 2007

termed a “loading”. For a given observation, xi, the projection onto the PCA model gives a set of “scores”, ti, via tiT ) xiTP, where the columns of P contain the latent vectors described above. For the entire data matrix we have X ) TPT + E, where the ith row of T is tiT and E is the residual matrix. The dimension of the PCA model, A, is the number of latent vectors selected (i.e., the A largest eigenvectors). The goal is to choose A so as to model all the significant systematic structure in the data without starting to fit the random noise. A number of criteria exist to decide A, with cross-validation being a popular choice.25 A.2. Partial Least-Squares (PLS). The first latent vector, w1, is determined so as to maximize the covariance between the latent variable, t1 ) Xw1, and the original Y-matrix. [In the context of PLS, the latent vectors, wa (a ) 1, ..., A), are termed “weight vectors”, with the individual element for each Xvariable, wak, termed a “weight”.]

max w1

s.t.

wT1 XTYYTXw1 wT1 w1 ) 1

(A.2)

The solution is the largest eigenvector (normalized to unit length) of the covariance matrix, XTYYTX. The second latent vector, w2, is the solution to eq A.2 with X replaced by X2, where X2 is the subspace of the column space of X that is orthogonal to t1. This is repeated for the third latent vector, and so on, with Xi+1 calculated via the following:

( )

Xi+1 ) I -

titiT

tiTti

Xi

(A.3)

where X1 ≡ X. The ith latent variable is then given by ti ) Xiwi. These are concatenated into the score matrix, T. The prediction model for Y is determined by the least-squares projection of Y onto T, i.e., Y ) TC + F, where C is the matrix of regression coefficients (C ) (TTT)-1TTY) and F is the residual matrix. Since the columns of T are orthogonal to each other, this least-squares regression problem is very well-conditioned, leading to stable regression coefficients in C. By contrast, if the variables in the original predictor matrix, X, are highly correlated, the regression coefficients obtained from ordinary least-squares (OLS) can be very unstable (high error variance) since (XTX) must be inverted in this case. Cross-validation is again a popular choice for determining the number of latent variables in the model, A. A key advantage of PLS over other regression methods (such as OLS, ridge regression, neural networks, etc.) is that it explicitly models the X-space via the latent vectors, [w1, w2, ..., wA]. This can often provide valuable insight into the correlation structure among the X-variables. Furthermore, this X-space model can readily be used to test whether future observations, xiT, are consistent with the correlation structure in the training set. This helps prevent inadvertently extrapolating beyond the data space for which the model is valid. Finally, the X-space model also allows easy treatment of missing data.

Literature Cited (1) Bruwer, M.-J.; MacGregor, J. F.; Bourg, W. M. Food Qual. Pref. Submitted for publication, 2006. (2) Kourti, T.; MacGregor, J. F. J. Qual. Technol. 1996, 28, 409-428. (3) Chen, G.; McAvoy, T. J.; Piovoso, M. J. J. Process Control 1998, 8, 139-149. (4) Clarke-Pringle, T.; MacGregor, J. F. Ind. Eng. Chem. Res. 1998, 37, 3992-4002. (5) Yacoub, F.; MacGregor, J. F. Chemom. Intell. Lab. Syst. 2004, 70, 63-74. (6) Tan, C. C.; Thornhill, N. F.; Belchamber, R. M. Trans. Inst. Meas. Control 2002, 24, 333-353. (7) Ocak, H.; Loparo, K. A. A new bearing fault detection and diagnosis scheme based on hidden markov modeling of vibration signals. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, 2001; Vol. 5. (8) Minion, R. L.; Leckie, C. F.; Legeard, K. J.; Richardson, B. D. Iron Steelmaker 1998, 25, 25-31. (9) Esbensen, K. H.; Halstensen, M.; Lied, T. T.; Saudland, A.; Svalestuen, J.; de Silva, S.; Hope, B. Chemom. Intell. Lab. Syst. 1998, 44, 61-76. (10) Belchamber, R. M.; Betteridge, D.; Collins, M. P.; Lilley, T.; Marczewski, C. Z.; Wade, A. P. Anal. Chem. 1986, 58, 1873-1877. (11) Kresta, J. V.; Marlin, T. E.; MacGregor, J. F. Comput. Chem. Eng. 1994, 18, 597-611. (12) Halstensen, M.; Esbensen, K. J. Chemom. 2000, 14, 463-481. (13) Bjo¨rk, A.; Danielsson, L.-G. Predicting pulp quality from process acoustic measurements on a medium consistency pulp stream. In Control Systems 2002, Stockholm, Sweden, 2002. (14) Bjo¨rk, A.; Danielsson, L.-G. J. Chemom. 2002, 16, 521-528. (15) Esbensen, K. H.; Hope, B.; Lied, T. T.; Halstensen, M.; Gravermoen, T.; Sundberg, K. J. Chemom. 1999, 13, 209-236. (16) Koenig, D. M. Control and Analysis of Noisy Processes; Prentice Hall: Englewood Cliffs, NJ, 1991. (17) Chau, F.-T.; Liang, Y.-Z.; Gao, J.; Shao, X.-G. Chemometrics: From Basics to WaVelet Transform; Wiley: Hoboken, NJ, 2004. (18) Næs, T.; Isaksson, T.; Fearn, T.; Davies, T. A User-Friendly Guide to MultiVariate Calibration and Classification; NIR Publications: Chichester, U.K., 2002. (19) Wold, S.; Kettaneh-Wold, N.; Skagerberg, B. Chemom. Intell. Lab. Syst. 1989, 7, 53-65. (20) Qin, S. J.; McAvoy, T. J. Comput. Chem. Eng. 1992, 16, 379391. (21) Holcomb, T. R.; Morari, M. Comput. Chem. Eng. 1992, 16, 393411. (22) Box, G. E. P.; Hunter, W. G.; Hunter, J. S. Statistics for Experimenters; Wiley: New York, 1978. (23) RT Pro Dynamic Signal Analysis: User Guide, revision 3.0; Dactron Inc.: Milpitas, CA, 2001. (24) Kourti, T. Int. J. Adapt. Control Signal Process. 2005, 19, 213s 246. (25) Eriksson, L.; Johansson, E.; Kettaneh-Wold, N.; Wold, S. Multiand MegaVariate Data Analysis: Principles and Applications; Umetrics Academy: Sweden, 2001. (26) Wold, S.; Esbensen, K.; Geladi, P. Chemom. Intell. Lab. Syst. 1987, 2, 37-52. (27) Westad, F.; Martens, H. J. Near Infrared Spectrosc. 2000, 8, 117124. (28) Hageman, J. A.; Streppel, M.; Wehrens, R.; Buydens, L. M. C. J. Chemom. 2003, 17, 427-437. (29) Trygg, J.; Wold, S. J. Chemom. 2002, 16, 119-128. (30) Burnham, A. J.; Viveros, R.; MacGregor, J. F. J. Chemom. 1996, 10, 31-45.

ReceiVed for reView June 29, 2006 ReVised manuscript receiVed October 25, 2006 Accepted November 13, 2006 IE060832R