Ind. Eng. Chem. Res. 1988,27, 343-350
ZH = integration constant in Kirchoff's equation, cabmol-l ZK = integration constant in van't Hoff s equation, adimen-
343
Andon, R. J. L.; Martin, J. F. J. Chem. Thermodyn. 1975, 7(6), 593. Csikos, R.; Pallay, I.; Laky, J. Proc. World Pet. Congr. 1980, 10(5), 167. Denbigh, K. G. Principles of Chemical Equilibrium, 4th ed.; Cambridge University: Cambridge, UK, 1981; Chapter 4. Farhat, A. M.; Mazhar, H. M. Arabian J.Sci. Eng. 1984,9(3), 221. Fenwick, J. 0.; Harrop, D.; Head, A. J. J. Chem. Thermodyn. 1975, 7(10), 943. Garibaldi, P.; Pecci, G.; Vicenzetto, F.; Razze, S. International Symposium on Alcohol Fuel Technology, Wolfsburg, D. F. R., Nov 1977; Conf. 771175. Gicquel, A.; Torck, B. J. Catal. 1983, 53(1), 9. Glastone, S. Thermodynamics for Chemist, 1st ed.; van Nostrand: New York, 1947; Chapter 13. Gupta, J. C.; Prakash, J. CEW, Chem. Eng. World 1980,15(8), 27. Harrington, J. A.; Brehob, D. D.; Schanerberger, E. M. Proc. Znt. Symp. Alcohol Fuels Technol., 3rd 1979, 111-53. Hawes, R. W.; Kabel, R. L. AZChE J. 1968, 14(4), 606. Heck, R. M.; McClung, R. C.; Witt, M. P.; Webb, 0.Znd. Eng. Chem. Prod. Res. Deu. 1981, 20(3), 474. Hougen, 0. A,; Watson, K. M.; Ragatz, R. A. Chemical Process Principles. Part ZZ Thermodynamics,2nd ed.; Wiley: New York, 1962; Chapters 25 and 26. Kabel, R. L.; Johanson, L. N. J. Chem. Eng. Data 1961, 6(4), 4. Obenaus, F.; Droste, W. Erdoel Kohle, Erdgas, Petrochem. 1980, 33(6), 271. Park, S. W.; Himmelblau, D. M. AZChE J. 1980, 26(1), 168. Perry, R. H.; Chilton, C. H. Chemical Engineer's Handbook, 5th ed.; McGraw-Hill: New York, 1973; Chapter 4. Reid, R.; Prausnitz, J.; Sherwood, T. The Properties of Gases and Liquids, 3rd ed.; McGraw-Hill: New York, 1977; Chapters 5 and 7. Reynolds, R. W.; Smith, J. S.; Steinmetz, I. Prepr.-Am Chem. SOC., Diu. Pet. Chem. 1975,20(1), 255. Setinek, K.; Prokop, Z.; Kraus, M.; Zitny, Z. Chem. Prum. 1977, 27(9), 401. Smith, J. M.; van Ness, H. C. Introduction to Chemical Engineering Thermodynamics, 1st ed.; Novaro S. A.: Juarez, Mexico, 1965; Chapter 13. Talbot, A. F. Proc.-Am. Pet. Znst. Refin. Dep. 1979, 58, 205. Taniguchi, B.; Johnson, R. CHEMTECH 1979, 9(8), 502. Tejero, J. Ph.D. Thesis, University of Barcelona, 1986. Yaws, C. L. Chem. Eng. 1976, 83(2), 107. Yaws, C. L.; Hopper, J. R. Chem. Eng. 1976, 83(12), 119.
sional K = thermodynamic equilibrium constant K, = ratio of fugacity coefficients K = equilibrium constant based on partial pressures K" = equilibrium constant based on molar fractions d T B E = methyl tert-butyl ether P = total pressure, atm pi = partial pressure of component i, atm e = gas constant, cal-mol-l.K-l So = standard entropy, cal.mol-'.K-' T = absolute temperature, K T , = reactor temperature, OC V = space velocity (total flow rate/catalyst weight),mo1.h-l.g-l yi = molar fraction of component i Greek Symbols a ,= significance level AGO = standard free-energy change of reaction, cal-mol-'
AGOf,+= standard free energy of formation of component i,
calqmol-' = standard enthalpy change of reaction, calsmol-' qfi = standard heat of formation of component i, calamol-' A H 0 , i = standard heat of vaporization of component i, cal-mol-l ASo = standard entropy change of reaction, cal.mol-'-K-' vi = stoichiometric coefficient of component i pi = fugacity coefficient of component i Subscripts E = MTBE e = equilibrium f = formation g = vapor phase I = isobutene i = component 1 = liquid phase M = methanol N = nitrogen v = vaporization Registry No. MTBE, 1634-04-4; CH,OH, 67-56-1; isobutene, 115-11-7; Amberlyst 15, 9037-24-5. A@O
Literature Cited Ancillotti, F.; Massi Mauri, M.; Pescarollo, E.; Romagnoni, L. J.Mol. Catal. 1978, 4(1), 37.
Received for review June 24, 1986 Accepted July 15, 1987
The Use of Biased Least-Squares Estimators for Parameters in Discrete-Time Pulse-Response Models N. L. Ricker Department of Chemical Engineering, BF-IO, University of Washington, Seattle, Washington 98195
T h e use of biased least-squares methods for the estimation of pulse-response coefficients in discrete-time process models is proposed. The two specific approaches considered here are partial least squares (PLS) and a method based on the singular value decomposition (SVD). Example applications include the estimation of the open-loop impulse and step responses of a pilot-scale anaerobic wastewater treatment process. The SVD and PLS methods give very similar results for the problems studied. T h e PLS method typically requires much less computer time, however. In order to use modern analytical techniques and algorithms for process control, one must provide a model of the dynamics of the subject process. Such models are usually linear and time-invariant and may be in either a state-space, transfer function, or "nonparametric" form. The third category includes pulse-response, step-response, and frequency-response formulations. Models can be ei0888-5885/88/2627-0343$01.50/0
.
ther continuous or discrete-time; the latter are convenient for computer control applications. The various discretetime models are reviewed in many texts (e.g., Astrom and Wittenmark (1984)). Ideally, one would like to develop process models from first principles. Most unit operations in chemical processes are too complex for this, however. In the case of an existing
0 1988 American Chemical Society
344 Ind. Eng. Chem. Res., Vol. 27, No. 2,.1988
process, an alternative is to develop an empirical model based on plant data. Such data can be obtained from observations during “normal” plant operation or from a series of planned experiments. Techniques for the identification of the model structure (i.e., order, number of intervals of pure time delay, etc.) and the estimation of adjustable parameters have been reviewed by many authors including Box and Jenkins (1976), Gustavsson et al. (1977)) and Ljung and Soderstrom (1983). Pulse-response models (and, equivalently, step-response models) appear in two contexts in control system applications. One is in the preliminary stages of estimation of a more parsimonious transfer function or state-space model as suggested by Box and Jenkins (1976). The other is direct use for prediction of process outputs in predictive control algorithms (e.g., Dynamic Matrix Control, Cutler and Ramaker (1979); Internal Model Control, Garcia and Morari (1982); Ricker (1985); Ricker et al. (1986)). An advantage of the pulse-response model is that there is no need to specify model structure. On the other hand, a large number of pulse-response coefficients is usually required. Their estimation can be difficult because of the high dimensionality of the model, and the use of a nonparsimonious model can lead to large prediction errors (see, e.g., Ljung and Soderstrom (1983), p 254). The pulse-response estimation method suggested by Box and Jenkins (pp 379-380) minimizes the problems caused by high dimensionality, but it is applicable only to systems in which the input may be represented by a stationary stochastic process. Experiments involving deterministic inputs (e.g., a step or a single pulse) usually fail to satisfy this assumption. The method suggested by Freedman and Bhatia (1985) can be used with deterministic inputs but calculates the coefficients using only the N most recent observations of the process output and 2N most recent values of the process input, where N is the number of coefficients to be estimated. The coefficients so determined are likely to be biased in typical process situations where the signal-to-noiseratio is poor. Also, their method is poorly suited to situations where N is large and/or the input signal is insufficiently exciting. (See additional discussion in example 1.) Wold et al. (1984) have suggested that modified leastsquares (LS) estimators such as “partial least squares” (PLS) might be used for estimation of coefficients in pulse-response models. In the following, two modified LS estimators are evaluated theoretically and applied to typical estimation problems.
Theory Linear Model Formulation. The usual discrete-time pulse-response representation of a linear system with m inputs, u, and a single output, y , is Yk+l
=
UkTh
+ dk+l
(1)
where h is a vector of pulse-response coefficients and dk+l is a disturbance. The subscript 12 denotes a sampling instant, t = kT, where t is the time and T i s the sampling period. Additional details are given in the Nomenclature section. It is assumed that each input variable is held constant between sampling instants (e.g., by a zero-order hold). Note that uk contains a sequence of vdues for each input and is of finite length, N , where N is the sum of the Ni values for i = 1, m. This assumes that the process is open-loop-stable; i.e., hij is assumed to be zero for j > N , (Garcia and Morari, 1982). Specification of Ni requires judgement and/or prior knowledge of the dynamic char-
acteristics of the process and will be discussed in more detail later. Estimation of Pulse-Response Coefficients. Given measurements of the inputs and the output a t a series of discrete points in time, we select M observations of the output and arrange them in the following manner: YT =
bkl
Yk2
...
YkMl
(2)
where kl,k,, ..., kw are the M selected sampling instants; they need not be consecutive. According to eq 1, for each element in y there is a corresponding sequence of N input values, uT. We arrange these input values in the following manner:
x = [ukl
uk2
...
UkmlT
(3)
where uki is the sequence of input values corresponding to Y k r . The original linear model, eq 1,may now be written in the form typically used for least-squares parameter estimation: y=XB+e
(4)
where B is a vector of N parameters to be estimated (in this case 0 = h ) and e is a vector of errors. If it is assumed that X is error-free, the expectation of y is X@,and the elements of e have zero mean, constant variance and are uncorrelated, then the standard least-squares estimate of @ is unbiased and is given by the normal equations
j3 = (XTX)-’XTy
(5)
which gives the minimum-norm estimate of e: G=y-XB
(6)
If X is poorly conditioned, however (see, e.g., Lawson and Hanson (1974) and Mandel, (1982)), the parameter estimates given by eq 5 will (1)change dramatically if elements of y are modified slightly (i.e., within the limits of the expected error) and (2) result in poor predictions of y when used with new X data. One can improve the conditioning of X by proper experimental design, but this is often difficult to arrange in a plant environment. For example, one of the measured inputs may be coming from a poorly controlled upstream process. The large number of parameters usually required in an pulse-response model also leads to ill-conditioning. Estimation via the Singular Value Decomposition. Lawson and Hanson (1974) give several alternatives to the use of the normal equations for least-squares problems. One of these involves the singular value decomposition (SVD) of the X matrix:
x =UZVT
(7)
where U w x M and V N x N are orthogonal matrices and Z is M by N with the structure
The diagonal matrix S ( r X r ) contains the r nonzero singular values of X, ol...or, where r = rank (X). These are positive and in order of decreasing magnitude. The ratio ol/or is the condition number of X; a large value indicates a poorly conditioned problem. It is important that the elements of X be properly scaled (see example 1). The linear model, eq 4, can be written in terms of the SVD matrices
+e
y = UZVr@
(9)
Ind. Eng. Chem. Res., Vol. 27, No. 2, 1988 345
or since U is orthogonal, UTy = ZVT@ + UTe
(10)
Thus, the SVD can be viewed as a transformation to orthogonal variables. A variation in a given element of the transformed,parameter vector VT@ affects (at most) one element in the transformed output vector UTy. The estimate of the least-squares residual, J, is now
J = &*& = (UTt5)T(UTt5) = (E&- $)T(Z&- $) (11) where
It is easy to show that the estimate, 8, that minimizes the residual is given by & =
Z+$
(14)
or where
-s-’ 02+ = 10
01
Assuming that rank (X) = N , which is the usual case, the estimates given by eq 15 are the same as those given by the normal equations. If X is poorly conditioned, however, the SVD provides diagnostic information that can be used to (1) indicate whether the data is of sufficient quality to provide reasonable parameter estimates and (2) minimize the sensitivity of the estimates to errors in the data. These issues are discussed in detail by Lawson and Hanson (1974) and Mandel (1982). One strategy for minimizing the sensitivity of the estimates to data errors is to stipulate that one or more of the smaller singular values of X are essentially zero; i.e., X is effectively singular. It is then impossible to determine N linearly independent parameter estimates from the data; one or more parameters must be chosen arbitrarily in order to estimate the rest. If, for example, we decide that the effective rank of X is re < N , then N - re parameters must be chosen arbitrarily. The usual choice is to set N - re elements of & equal $0 zero, which results in the minimum-norm estimate of B. Note, however, that if element &i is set to zero, the residual, J, increases by Thus, there is a tradeoff between a reduction in the number of independent parameters to be estimated and the resulting poorer “fit” of the model to the data. In most (but not all) cases, the $12 values associated with the smaller singular values are negligible; i.e., the corresponding &ivalues do little to improve the fit. The values associated with the smaller singular values also have relatively large estimated variances (Mandel, 1982), and a large variance in an element qf & will increase the variances of all of the elements of B. For these reasons, it is usually best to set selected va!ues to zero, even though this causes a statistical bias in 8. The choice of which values to retain and which to set to zero requires judgement on the part of the investigator. One can, for example, compare the f i r 2 values with the expected variance in y (obtained, e.g., through an error analysis) and retain only the GI values that reduce J significantly. Another approach is to eliminate all itIvalues corresponding to uIless than a specified fraction (e.g., 10%) of the maximum singular value, o1.
A third approach, which is appropriate if the resulting model is to be used mainly for prediction, is to partition the original data into subsets. The parameters are estimated using one subset, and the resulting model is used to predict the y values in the other. The quality of fit in the prediction step, measured, e.g., by the mean squared error (M.S.E.), indicates how many non-zero GI values should be retained. Typically, the M.S.E. is a minimum for 1 5 n I N , where n is the number of non-zero values retained. For the estimation of pulse-response coefficients from plant data where N is of the order of 100, n will usually be less than 30, often much less. Partial Least-Squares Estimation. Partial Least Squares (PLS) is another techniques for solving ill-conditioned least-squaresproblems. It was developed by Wold (1982) and has been described in recent review by Wold et al. (1984), Naes and Martens (1985), and a tutorial by Geladi and Kowalski (1986). As pointed out by Lorber et al. (1986),PLS is a variation of the NIPALS (noniterative partial least squares) algorithm, which computes singular vectors and singular values of a data matrix, X, by an iterative procedure. Lorber et al. also examined the relationship between PLS and methods based on the SVD. The present discussion will be restricted to the special case of a single dependent variable. The more general case is covered in the papers cited above. The data matrix, X, ca_n always be representeg by the product of two matrices, TMxr (the “scores”)and PNxr(the “loadings”): X = T p T = tlpIT+ tGzr + ... + trprT
(17)
where = rank (X) and ti and pi are the ith columns of T and P, respectively. In PLS, X is written in the following related form:
X=TPT+E (18) where TMxpand PNXpcontain the first p columns of and P, p is the number of “latent variables” (analogous to the number of non-zero &I values retained in the SVD analysis), and EMxN is a residual matrix that is assumed to describe random errors in X and/or components of X that have no predictable effect on y . In the usual PLS formulations, the columns of T are mutually orthogonal and the columns of P are normalized to be of length unity. This form will also be used here. A geometric interpretation of the PLS representation is given by Geladi and Kowalski (1986). Relating X and y in the same linear form as in eq 4 we get y = TPT@+ f =Tb+f (19) where bpx, (=PT@) is a vector of parameters to be estimated and fMX1 is a vector of residuals, Following the usual least-squares analysis, the estimate, b , that minimizes the norm of f is
6 = (TTT)-lTTy
(20)
Since the columns of T are orthogonal, the ith element of b is just
6, = tITY/(tlTtI)
(21)
Given a new set of X values, one can predict the corresponding y values by performing the PLS decomposition o,n the new X and using the estimated parameter vector, b , in eq 19 with f = 0. In the present work, however, y e need the values of the original regression coefficients, @. Unfortunately,_unlessN , = p , there is no unique _relationship between b and @. The minimum-norm 8 can be
346 Ind. Eng. Chem. Res., Vol. 27, No. 2, 1988
computed using the SVD in the following manner: = Vpz+,upTb
(22)
where the subscript ”P”denotes that the SVD matrices are to be derived from PT(rather than X). Since rank (P? = p and p 100) but the required number of latent variables, p , is small, the PLS calculations take much less computer time than the SVD of X. For example, the SVD of an X matrix with N = 100 and M = 140 required approximately 7 min on a VAX 11/750, whereas the computation of the first five PLS latent variables required about 20 s. One might argue that the entire SVD is rarely needed, and it is possible to compute only the first p singular values f1
= y - b,t,
and vectors by using the NIPALS method. This converges very slowly, however, if the ratio of successive singular values is close to unity (which is usually the case for the estimation of pulse response coefficients). Relationship of the PLS and SVD Approaches. Another way to see how PLS works is to examine its relationship to the SVD method. Lorber et al. (1986) have shown that the first score vector found by the PLS algorithm can be written in terms of the SVD of X:
i.e., t is a linear combination of the first r columns of U in which the columns corresponding to large singular values and large rLi values dominate. Similarly, the ith score vector can be thought of as a linear combination of the first r - i + 1 columns of the U matrix from the SVD of the (i - l),tresidual matrix. Recall that the largest singular values correspond to the SVD components of X that have the best signal-to-noise ratio. The largest rCli values correspond to the SVD components that are most strongly correlated with y. We thus see that PLS tends to ignore SVD components that are poorly correlated with y and/or are likely to be mainly noise. In effect, PLS also assumes that the noise is distributed over all SVD components rather than just in those corresponding to the last r - p singular values. Further discussion of these points is contained in the next section. The theoretical development of both methods involves a linear transformation of variables, followed by the estimation of a number of transformed parameters that is (usually) much smaller than the original number. Estimates of the original parameters are derived through linear combinations of the transformed parameters. In other words, the SVD and PLS methods automatically impose linear equality Constraints on the allowable values of the original parameters. One might obtain a similar result by imposing predefined linear equality constraints. For example, rather than estimating all Ni values, hi, one might select a subset to be estimated and then stipulate that the rest are to be obtained by linear interpolation between the estimated values.
Examples 1. Hypothetical Process. A hypothetical single-input, single-output process has a true pulse response, hT = [O.O -0.5 3.0 1.0 0.31; all values beyond the fifth value are zero. Thus, N , = 5 = N. The process gain, which is the sum of the pulse response coefficients, is 3.8. Suppose that the input variable (which is assumed to be held constant during each sampling time period) is varied in a predetermined manner for five successive periods. The output is measured a t the end of the sixth period. The corresponding values then constitute a row of the X and y matrices. This procedure is repeated for a total of M = 6 samples. The resulting X matrix is given in Table I as the “calibration set”. The corresponding “true” y ( = X h ) and a “noisy” y are also shown in Table I. The latter was derived by adding zero-mean Gaussian noise with a standard error of 0.05 to the true y. A second set of M = 6 samples of the input and output are included in Table I as the “test set”. In this case, only the true y is given. Note that the columns of X in the calibration set have roughly zero mean and equal variance. This helps to minimize problems with roundoff error in the decomposition procedures. When using process data, it is best to scale the X matrix so that it has these properties. One way to achieve a near-zero mean is to difference the X and y
Ind. Eng. Chem. Res., Vol. 27, No. 2, 1988 347 Table I. Data from the Hypothetical Process calibration set true -1.20 -3.80 -4.80 1.70 3.83 4.30
test set
x
X
YL
noisy -1.37 -3.72 -4.64 1.73 3.73 4.25
-1.0 -0.9 1.0 0.0 1.0 0.0
X,,Z
x,,3
-1.0 -1.0 1.0 0.0 1.0 0.0
-1.0 -1.0 -1.0 1.0 1.0 1.0
%4
1.0 -1.0 -1.0 -1.0 1.0 1.0
Table 11. Results of SVD Analysis for Hypothetical Process i Ui tb;2IrL*rL 1 3.681952 0.692 429 2 3.018513 0.044 080 3 1.829314 0.263 315 4 0.070 700 0.000 004 5 0.027 829 0.000 171
data (as described by Box and Jenkins (1976)). Another is to calculate the mean of each input signal and subtract it from each sample of that signal. It is then necessary to estimate the steady-state value of the output corresponding to the mean of the inputs. This must be subtracted from all the y values. If the variances of the resulting columns differ significantly, they can all be normalized to unit variance (by division of each column by its variance). Of course, this also scales the parameter estimates. In order to recover the unscaled estimates, one must divide Biby the variance of the ith column of X. The results of the SVD of the calibration X are shown in Table 11. Whereas the first three singular values are of the same order of magnitude, the last two are much smaller. Visual examination of X shows that the first two columns are nearly identical, as are the last two. Hence, although rank (X)= 5, X is poorly conditioned and has an effective rank of 3. Given the true X and y values shown in Table I, the normal equations (or PLS with p = 5) yield the true values of h . If, however, one uses the true X with the noisy y , the standard least-squares estimates rounded to two decimal places are hT = [1.62 -2.00 3.00 2.78 -1.521. The fit of the noisy data is nearly perfect (since M = N)-the M.S.E., defined here as ST& / M , is 3.75 X 10". The estimated gain of 3.88 is close to the true gain of 3.8, but the individual coefficients oscillate between large positive and negative values. This is typical behavior when one attempts to estimate too many parameters from inadequate data. Note that the method proposed by Freedman and Bhatia (1985) is similar to the above but with M = N to get a perfect fit. Further evidence that the estimated parameters are erronous comes when we attempt to predict the y values of the test set, i.e., using y = X h where X is now the test set matrix in Table I. The M.S.E. for the test set is 5.25, much higher than for the calibration set. We now attempt to minimize the problems caused by the poorly conditioned X of the calibration set. Table I11 shows how the M.S.E. varies with the number of components retained in the SVD analysis and the number of latent variables used in PLS. Unless otherwise noted, successive SVD components are used, i.e., an entry of n = 3 in column 1 of Table I11 means that the first three SVD components are retained. The M.S.E. is a minimum (at the same M.S.E. value) for p = 3 latent variables in PLS and for n = 3 in the SVD method, Le., for a number of components equal to the effective rank of X. Also, the
'1,5
1.0 -1.0 -1.0 -1.0 1.1 1.0
true
Yl
1.2 3.2 4.8 -1.2 -4.2 1.2
%,,1
x1,z
x,,3
'1.4
x,,5
1.0 -1.0 -1.0 1.0 1.0 -1.0
1.0 1.0 -1.0 -1.0 1.0 1.0
1.0 1.0 1.0 -1.0 -1.0 1.0
-1.0 1.0 1.0 1.0 -1.0 -1.0
-1.0 -1.0 1.0 1.0 1.0 -1.0
Table 111. Comparison of SVD and PLS for Example 1 mean mean squared error squared error norp SVD PLS norp SVD PLS 0 9.20 9.20 4 0.31 2.58 1 7.81 6.55 5 5.25 5.25 2 7.34 2.62 2" 0.32 3 0.23 0.23 "SVD Components 1 and 3 were used.
corresponding parameter estimates are hT = [-0.24 -0.22 2.98 0.59 0,611 for both methods. (The fact that the M.S.E. and h values are identical for n = p = 3 is an artifact of the example and would not happen in general.) The estimated gain of 3.72 is again close to the true value, and in contrast to the use of n = p = 5, the estimates of the individual coefficients are much closer to the true values. The M.S.E. value for p = 1 in PLS is noticably lower than that for n = 1 in the SVD method. This is typical since, as explained previously, the PLS estimate for p = 1 is weighted to give a good prediction of y , whereas the first component of the SVD analysis depends only on the properties of X and, in general, may be uncorrelated with y . Table I1 shows that for this example the first SVD component correlates about 69% of the variance in y , so the difference between the two methods for p = n = 1 is small. Notice, however, that the second SVD component correlates only 4% of the variance in y , whereas the third component correlates 26%. We would thus expect the M.S.E. for p = 2 in PLS to be significantly lower than for use of the first two SVD components, and this turns out to be the case. On the other hand, if we retain only the first and third SVD components, the M.S.E. for SVD is lower than for p = 2 in PLS, as shown in Table 111. The main point here is that if one arbitrarily decides to use a small number for n and p , PLS will probably give a better prediction than the SVD method. If, however, one is free to minimize the M.S.E. by optimizing n and p , there will be little if any difference in the predictions. Similar conclusions were reached by Naes and Martens (1985). In general, the number of pulse-response coefficients needed to model the process will be unknown at the start of the estimation problem. Some exploration will be required to determine the appropriate number. One approach is to search for the number of coefficients for which the M.S.E. is a minimum. In the present case, for example, we might suspenct that the first two estimated h values are essentially zero. If we eliminate the first column in X for the calibration and test sets and repeat the above calculations, the minimum M.S.E. is 0.187 at n = p = 3 with hT = [-0.46 2.98 0.60 0.611; Le., elimination of the unnecessary first coefficient improves the estimates of the remaining coefficients and the predictive abilities of the model. for most plant data, the results will be insensitive to modest changes in the number of coefficients, especially when coefficients in question represent the "tail" of the
348 Ind. Eng. Chem. Res., Vol. 27, No. 2, 1988
pulse response rather than the beginning. Note that the elimination of the first coefficient is equivalent to the assumption of a pure delay of one sampling interval. In general it is good practice to attempt to estimate the delay in order to improve the estimates of the remaining parameters. Methods for doing this are described by Box and Jenkins (1976), for example. 2. Anaerobic Wastewater Treatment Process. Slater (1987) has collected dynamic response data from a pilot-scale anaerobic wastewater treatment process. Details are available in Slater’s thesis. In the experiments to be analyzed here, the influent was a solution of 4000 ppm by weight butyric acid with trace nutrients and buffering agents added to provide near-optimal conditions for bacterial growth. The objective was to determine the dynamics of the gas-phase composition with respect to variations in the feed rate of the influent, which was varied in a pseudorandom binary sequence (PRBS; see, e.g., Harris and Mellichamp (1980)). Slater used the PRBS developed by Van den Bos (1967) which is designed to excite the process at six specific frequencies. Using the formulas given by Harris and Mellichamp (1980), Slater chose a sampling period of 3.5 min. This placed the six frequencies in the range appropriate for the expected dominant time constant of 2 h. (This sampling period is much shorter than would be necessary in a process control application. If, for example, the closed-loop response time was on the order of 1, the rule of thumb suggested by astrom and Wittenmark (1984) leads to a sampling period of 15 min, which would reduce the number of pulse response coefficients by a factor of 4 and would improve estimation accuracy.) Gas compositions were measured by an on-line gas chromatograph equipped with a thermal conductivity detector to monitor methane and carbon dioxide and a special detector to monitor reduced gases, primarily hydrogen at the parts per million level. The response of the hydrogen concentration was of particular interest since it has been speculated that it might provide early warning of problems in the reactor. Slater’s thesis contains the results of two independent experiments (carried out approximately 1 week apart) in which hydrogen concentration varied in response to variations in influent flow rate. In the first case (the calibration set; Slater’s MFBI No. 2), the influent variations were f1.92 L/day about a nominal steady state of 4.67 L/day, whereas in the second case (the test set; Slater’s MFBI No. 1) they were f1.16 L/day about a nominal steady state of 4.61 L/day. Each experiment lasted 1.5 days. The calibration set consisted of 103 values of the H2 concentration beginning 11 h after the start of the PRBS and 14 min apart (Le., every fourth sampling interval). The test set consisted of 63 values beginning at the same point but 17.5 min apart (every fifth sampling interval). The calibration set was analyzed by using both the SVD and PLS methods. The number of pulse-response coefficients was varied between 60 and 120 in increments of 10. For each case, the number of SVD components (or PLS latent variables) minimizing the M.S.E. of the test set was determined. Finally, the number of pulse-response coefficients minimizing the M.S.E. of the test set was determined, which occurred at 90 pulse-response coefficients for both methods. The reduction in M.S.E. was 54.8% for PLS and 58.7% for the SVD method (relative to the assumption of no response, i.e., h = 0). The “process noise” in the H2 concentration was measured independently during a 24-h period of steady-state
H
2 C 0
n C
e n
t r E
t 1
0
50
100
150
200
250
300
3 0
(mln)
TIIn.3
Figure 1. Step response models for SVD and PLS methods. H
1
,
1
2
Upper
bound
- PL Lo Sw e-r
s r r o ~b o u n d
- ’
’
.. . 0 6
. .,
.
90 Coats
0
0
-0
0
n 5
1
1
10
15
1
20
1
25
1
1
30
35
l
40
l
I
I
45
50
55
I
60
o a t 0 Point
\
Figure 2. PLS model fit to test set. H
0 4
2 0 3
, ,,
0 2 ,
.I
0 1 -
- suo
0i
.... SUO SUO -. ‘ S U O
f
n
-0
“112
.
n. .2 23 3 n n=36
I
I
.
.
’’,.
i
50
100
150 Time
200
250
300
350
(mi”)
Figure 3. Effect of number of SVD parameters on step response.
operation and found to account for 25.4% of the variance in the test set. Thus, approximately 20% of the variance was unexplained by the linear models. Possible explanations include process nonlinearities and/or a lack of reproducibility in the experiments; both are notorious problems with biochemical systems. The use of 80 or 100 pulse-response coefficients gave nearly the samp percentage reductions as 90; i.e., the results were insensitive to this choice. Figure 1 shows the step response determined by using the PLS and SVD methods with 90 pulse-response coefficients. (The H2 concentration in these figures has been normalized; see Slater’s thesis for original data.) The agreement between the two methods is excellent. The resulting fit of the test set data for PLS is shown in Figure 2. The PLS prediction is generally within or just outside the error bounds defined by the 63 data points f one estimated standard deviation unit. The results for the SVD method were, if anything, slightly better than those for PLS. It was found that the SVD method was relatively insensitive to the choice of the number of SVD components. Figure 3 shows, for example, that both n = 12 and n = 23 (the “best” value) give approximately the same step re-
Ind. Eng. Chem. Res., Vol. 27, No. 2, 1988 349 H 2
0 4
0
sm
10m
150 200 T , m r (m,”)
250
300
350
Figure 4. Effect of number of PLS latent variables on step response.
sponse, although we begin to see superimposed oscillations a t n = 36 and above. The value giving the minimum M.S.E. was, on the average, 27.5% f 2.5% of the number of pulse coefficients used. Figure 4 shows, on the other hand, that the PLS step response varies dramatically over the range p = 3 to p = 9. Here the minimum M.S.E. was a t 5.9% f 1.0% of the number of pulse-response coefficients and was p = 6 for the case shown in Figure 4. It would be interesting to compare the step responses obtained above with data from experiments in which a step input was used in place of the PRBS. Slater performed several identical step tests but their reproducibility was poor and no conclusive comparison could be made with the PRBS experiments. The main problem with the step tests was that it was impossible to achieve true steady-state operation; significant unanticipated disturbances always occurred during the 48 h needed for a step test, obscuring the response to the initial change in the influent flow rate. The results from independent experiments using the PRBS were much more consistent. I t should be noted that Slater’s process, although fundamentally complex, was well instrumented and controlled relative to most industrial processes. Simple puke and step tests are likely to be difficult to use effectively in industrial settings because of problems with signal-to-noiseratios and unanticipated disturbances. The use of a continuous sequence of intentional disturbances and associated estimation techniques can overcome these problems.
Discussion and Conclusions Both the SVD and PLS methods can be used to estimate coefficients in a linear pulse-response model. Such models can be used to approximate many dynamic systems (e.g., Astrom and Wittenmark (1984)). If used as suggested above, the methods give similar results. The theoretical basis for the SVD method is more firmly established, but the SVD method generally requires much more computer time than PLS, especially when the number of coefficients to be estimated is large (i.e., >loo). In such cases, the SVD required on the order of 10 min on a VAX 111750,whereas PLS required 0.5 min. This can be significant since the methods are most effective if the user is able to interact with the program, checking the shape of the estimated pulse response, the M.S.E. for prediction, etc., and then varying the number of coefficients and/or latent variables to improve the model. Accuracy of the resulting estimates is influenced mainly by the quality of the data. As with other modeling approaches, accuracy improves when the effect of unknown process disturbances and measurement noise on y and X is small, and X has a frequency content appropriate for the expected use of the model. In addition, the duration
of the experiment must be such that it is possible to include an appropriate number of columns in the X matrix (corresponding to the point where hid 0). This can be difficult to achieve in the case of processes with long time constants, in which case a transfer function model is a better choice. The PLS and SVD methods can be used with both stochastic and deterministic inputs. For stochastic inputs, however, methods based on autocorrelation and crosscorrelation analysis (Box and Jenkins, 1976) may be more convenient. In either case it is important to take special precautions for data obtained under closed-loop conditions (Gustavsson et al., 1977). It may be possible to improve predictions of the PLS algorithm by modification of the method used to calculate the latent variables. Lorber et al. (1986) suggest an approach that works better for a specific class of problems, but a more general solution would be useful. Finally, it should be possible to use PLS for the estimation of parameters in transfer function models, e.g., as in a typical extended-horizon adaptive control application.
Acknowledgment This material is based upon work partially supported by the Center for Process Analytical Chemistry at the University of Washington. Helpful comments were provided by T. Sim, J. B. Callis, and B. R. Kowalski.
Nomenclature b = vector of PLS parameters to be estimated, length p (= P b, = ith element of b dk = value of the disturbance at the kth sampling interval e = vector of errors in eq 4, length M f = vector of errors in PLS modeling, eq 19, length M E = M X N matrix of residuals in the PLS decomposition of X (see eq 18) k = given sampling instant hT = [hl,l h 1 , ~ ... h l , N l ~ Z , I haNz h m , l hm,~”,] h,, = jth pulse-response coefficient relating the ith input to the output J = least-squares residual (see eq 11) n = number of aj values retained in the SVD method N, = truncation point for the ith input variable (see definition of h T ) p = number of “latent variables” used in PLS model pi = column i of the loadings matrix, P P = N X r “loadings matrix” defined in eq 17 P = N x p matrix derived from the first p columns of P; used in PLS modeling (see eq 18) r = rank of X re = effective rank of X S = diagonal r X r SVD matrix, =diag ( o l ,...,or) t, = ith column of the scores matrix, ‘ilength ’, M ‘i” = M x r “scores”matrix in the decompositionof X defined in eq 17 T = M X p matrix derived from the first p columns of and used in PLS model *e*
UkT = IU1,k-l %,k-2 Um,k-N,l u , , k = value of the
U l , k - N 1 %,k-1
1..
U2,k-N2
.*a
% z , k - l .**
ith input at the kth sampling instant U = orthogonal M X M SVD matrix defined in eq 7 V = orthogonal N X N SVD matrix defined in eq 7 X = M X N matrix of input values arranged as in eq 3 Y k = value of the output at the kth sampling instant
Ind. Eng. Chem. Res. 1988,27,350-352
350 Greek Symbols
= transformed parameter vector in SVD method (see eq 12), length N ai= ith element of Q B = vector of N parameters to be estimated J/ = transformed output vector in SVD method (see eq 13), Q
length M $i= ith element of J, ai= ith singular value of X and ith element on the diagonal of s 2 = M X N SVD matrix defined in eq 7 E+ = pseudoinverse matrix defined in eq 16 Superscripts T = vector or matrix transpose *
= estimate
Literature Cited Astrom, K. K.; Wittenmark, B. Computer Controlled Systems; Prentice-Hall: Englewood Cliffs, NJ, 1984. Box, G. E. P.; Jenkins, G. M. Time Series Analysis, Forecasting and Control, 2nd ed.; Holden Day: San Francisco, 1976. Cutler, C. R.; Ramaker, B. L. Paper presented a t the 86th National AIChE Meeting, April 1979. Freedman, R. W.; Bhatia, A. Proceedings of the 1985 American Control Conference, Boston, 1985, p 220. Garcia, C. E.; Morari, M. Ind. Eng. Chem. Process Des. Deu. 1982, 21, 308. Geladi, P.; Kowalski, B. R. Anal. Chim. Acta 1986, in press.
Gustavsson, I.; Ljung, L.; Soderstrom, T. Automatica 1977, 13, 59. Harris, S. L.; Mellichamp, D. A. Ind. Eng. Chem. Process Des. Deu. 1980, 19, 166. Lawson, C. L.; Hanson, R. J. Solving Least Squares Problems; Prentice-Hall: Englewood Cliffs, NJ, 1974. Ljung, L.; Soderstrom, T. Theory and Practice of Recursive Identification; MIT: Cambridge, MA, 1983. Lorber, A.; Wangen, L. E.; Kowalski, B. R. J . Chemometrics 1986, in press. Mandel, J. Am. Stat. 1982, 36(1), 15. Naes, T.; Martens, H. Commun. Statistics.-Simula. Computa. 1985, 14, 545. Ricker, N. L. Ind. Eng. Chem. Process Des. Deu. 1985, 24, 925. Ricker, N. L.; Sim, T.; Cheng, C. M. Proceedings of the 1986 American Control Conference, Seattle, June 1986; Paper WP1-3:30, 355. Slater, W. R. M.S. Thesis, University of Washington, Seattle, 1987. Van den Bos, A. Proceedings of the 1st IFAC Symposium on Identification in Automatic Control Systems, Prague, June 1967; Paper 4.6. Wold, H. Systems Under Indirect Observation, Part II; Jlreskog, North Holland Amsterdam, 1982; pp 1-54. K. G., Wold, H., Ms.; Wold, S.; Albano, C.; Dunn, W. J., 111; Edlund, U.;Esbensen, K.; Geladi, P.; Hellberg, S.; Johannson, E.; Lindberg, W.; Sjostrom, M. Chemometrics-Mathematics and Statistics in Chemistry; Kowalski, B. R., Ed.; D. Reidel: Dordrecht, Holland, 1984; pp 17-96.
Received for review March 20, 1987 Accepted October 8, 1987
Preparation of Aqueous Solutions of Hydrofluoroaluminic Acid and the Preparation of Some of Its Salts Maurice Adelman Department of Chemical Engineering and Applied Chemistry, University of Toronto, Toronto, Ontario, Canada M4H 155
Both natural cryolite and cryolite recovered from the stack gases, in the production of aluminum, have been broken down t o form a hydrofluoroaluminic acid solution. The sodium salt was produced from this solution by neutralizing it with sodium carbonate solution and is identical with natural cryolite. In addition to the sodium salt, the potassium, lithium, and aluminum salts also were produced. The two most important chemicals required for the manufacture of aluminum are bauxite, the hydrated oxide of aluminum, and cryolite (Na3AlF6),which, at temperatures near lo00 OC, acts as the solvent and electrolyte for the dehydrated bauxite. Since the temperatures used in this process are around 940-980 O C and cryolite has a high vapor pressure at these temperatures, a large amount of cryolite goes up the stacks with other materials. A large amount of solid material is trapped and removed by centrifuges. In spite of the high percentage of cryolite in this material (some analyses indicate as high as 85% cryolite (Alcan International, 1986)) (Figure l ) , nothing has been done with this material up to the present except to bury it a t a cost of around $20.00/ton (Alcan International, 1987). If the cryolite in this material can be recovered as a water solution of hydrofluoroaluminic acid a t a reasonable price, it would be very useful in terms of getting rid of a possibly hazardous material and of opening the possibility of obtaining salts of hydrofluoroaluminic acid that would be useful in making the melt more efficient (Dewing, 1974). Although cryolite is quite insoluble in water, it is known to be soluble in strong NaOH solution. However, this 0888-5885/88/2627-0350$01.50/0
would require too high a cost in getting back the cryolite or converting it to hydrofluoroaluminic acid. However, it is known that the solubility of cryolite in 1.5 wt % HCl a t 20 "C is 0.3 g/100 g of solvent (Kirk-Othmer Encyclopedia of Chemical Technology, 1,980). Therefore, if one can make such a solution and react it with a material, the sodium salt of which is less soluble than cryolite, it should be possible to produce a solution of hydrofluoroaluminic acid:
-
Na3[A1F6]+ xH+OH3Na+ + H3[A1F6]+ 30H-
+ ( x - 3)HOH
(a)
If very strong cation-exchange resins are used such as Amberlite 15 or Amberlite IR-120 plus or IR200, then
-
3Na+ + H3[A1F6]
(resin)H+
Na+(resin) + H3[A1F6] (b)
Since the sodium salt of the resin is more insoluble than Na3AlF6,except at very low pH, reaction b will move to the right until the amount of hydrofluoroaluminic acid makes the pH so low that the reverse reaction takes place: (resin)Na + H+
-
(resin)H+ + Na+
0 1988 American Chemical Society
(C)