+
+
2782
Ind. Eng. Chem. Res. 1996, 35, 2782-2789
Recursive Nonlinear System Identification Using an Adaptive WaveARX Network Junghui Chen* Department of Chemical Engineering, National Tsing-Hua University, Hsin-Chu, Taiwan, R.O.C.
Duane D. Bruns† Department of Chemical Engineering, University of Tennessee, Knoxville, Knoxville, Tennessee 37996-2200
The recently developed WaveARX neural network has its methodology extended to include net adaptation and on-line implementation in real time. The net evolves by changing its architecture: generation, annihilation, or change in location of neurons. Systematic design procedures remove the traditional network problems of no guidelines to define the architecture and the often strong dependence of the trained network parameters on their initial conditions. In addition, the design procedures provide network information needed for evolution. Two demonstrations are implemented on-line using process simulations of a chaotic map and a pH CSTR model. These provide illustrations of several capabilities of the new network, along with comparisons to other identification techniques. 1. Introduction Process identification is of great interest to practitioners in the field of process control. Available information for identification is basically utilized in two ways. One way is to collect a batch of data, and, in a separate off-line procedure, the batch data are used to establish a model. The other way, which is the focus of this paper, is to deal with on-line data and then to update the model as new data are obtained. On-line applications and adaptive models are, in general, more challenging to implement but are considered more important than off-line methods. This is especially true when we need to keep track of the process behavior in real time. Many situations call for adaptive identifiers, including time-variant processes, changes in operating conditions, encounter of new or different disturbances, or detection of irregularities. It is important to note that on-line data collection and process characterization are typically investigated under the current local operation region. Therefore, the recursive ability of the real-time algorithm is probably the most suitable for tracking and predicting nonlinear relationships. In this study, a new on-line adaptive algorithm based on the recently developed WaveARX network is presented. The proposed model identification algorithm, the adaptive WaveARX network, extends our preceding research (Chen and Bruns, 1995), which considers only off-line identification of static and dynamic models. On-line identification methods using the artificial neural network (ANN) have been widely developed in recent years. For example, Bialasiewicz and Ho approximated processes by a linear state space model with variable parameters (Bialasiewicz and Ho, 1991). They used the stochastic neural identification algorithm to adjust those parameters. Also, an on-line radial basis function network (RBFN) was integrated with clustering techniques and a least-squares algorithm (Chen and Billings, 1992; Chen et al., 1992). The clustering was used to determine the location of each neuron, while the least-squares algorithm was used to estimate the pa* Author to whom correspondence should be addressed. E-mail:
[email protected]. † E-mail:
[email protected].
rameters of the network. Another example is that a recurrent neural network was combined with iterative least-mean-square learning algorithms to adapt network weights (Seidl and Lorenz, 1994). However, it is noted that their research shares a common attribute found in most on-line schemes; that is, a fixed network structure is first established that allows little freedom for adjusting the number of neurons. This frozen architecture may cause overfitting when a network uses more neurons than needed; on the other hand, a network without enough neurons is likely to result in poor approximations of the process. Other researchers have proposed adaptable network architectures, but their studies were confined to off-line applications only (Fahlman and Lebiere, 1991; Bhat and McAvoy, 1992; Le Cun et al., 1990). This will be discussed in more detail in section 3.2. The consequences of these facts motivate us to exploit the adaptive WaveARX network, an adaptive technique for real-time identification. The objectives of the adaptive WaveARX network formalism when applied to a given process are (a) to provide a dynamic network architecture for on-line applications (more specifically, dynamic architecture means the network can self-adjust the number of neurons and update the parameters according to the changes of the process behavior), (b) to produce a compact form to effectively approximate models (in other words, the adaptive network not only can change the number of neurons but also can keep the neurons at the minimum when modeling the dynamic process behavior), and (c) to be able to predict the responses of time-variant processes. The paper is organized with a brief review of the conceptual framework of the WaveARX network in section 2. Section 3 investigates how the adaptive technique is incorporated into the WaveARX network for on-line recursive identification. In order to test the capability of the adaptive WaveARX network, two simulation cases are studied in section 4. Finally, conclusions are made and future research is suggested in section 5. 2. WaveARX Network The WaveARX network has been shown to be an excellent algorithm which successfully identifies process
+
+
Ind. Eng. Chem. Res., Vol. 35, No. 8, 1996 2783
behavior containing linearity, nonlinearity, or a combination of both (Chen and Bruns, 1995). It incorporates wavelet transforms into a traditional feedforward network. The wavelet transform offers systematic steps to construct the network architecture and provides excellent initial values for network parameters. This greatly reduces the computation needed for training. Often no optimization is needed, as the model with the initial values of the network parameters meets the error criterion. With a local wavelet activation function, the systematic design synthesis also eliminates some problems associated with the traditional multilayer perceptron, such as slow convergence, need for large training data sets, and long training time. In this section, let us briefly review the WaveARX network. 2.1. Architecture of the WaveARX Network. The WaveARX network consists of two parts: the wavelet network and the ARX model. The wavelet network, which is used to identify the nonlinear characteristics of a process, is a three-layer feedforward neural net. Each neuron of the hidden layer contains a wavelet function which serves as the activation function. Connected in parallel with the wavelet network is the ARX model. This linear modeling section is able to capture the linear characteristics of a process. The combination of the wavelet network and the ARX model forms a parallel identification algorithm. It has been demonstrated in our previous research to be effective and efficient at capturing and isolating the linear and nonlinear contributions to a process behavior. The WaveARX network can be mathematically defined as:
(| |) x - tkl sk
+ cTx + c0
{
wklψkl ∑ k,l
(1)
{
N
f(x) )
Wavelet Network
ARX
where ψkl is called the wavelet function (k ∈ Z, and l ∈ Zn), x ⊂ Rn represents input variables, tkl are translations, sk are scales, f(x) is the function or output variable to be approximated, wkl refers to the wavelet weighting coefficients, the c vector contains the linear coefficients, and c0 is a constant. (For a dynamic process, x consists of the past inputs and/or past outputs.) Each wavelet function, ψkl, is further defined as:
ψkl(x) ) a-k/2ψ(a-kx - lb)
(2)
Here a and b are discrete scale and translation step sizes, respectively, i.e., sk ) ak and tkl ) lakb. The scales relate to frequency characteristics, while the translations relate to time characteristics (Daubechies, 1992). The binary partitions, a ) 2 and b ) 1, are popularly selected. 2.2. Training of the WaveARX Network. Training of the WaveARX network consists of three steps: (1) generating neuron candidates, (2) purifying neurons, and (3) optimizing the wavelet parameters. In step 1, neuron candidates are selected by multiresolution analysis (MRA) based on the distribution density of the input training data. That is, wherever the data density is high enough, a neuron will be assigned to cover that group of data. With the initial selection of neurons, a prototype of the WaveARX network is established. However, some of these neuron candidates may not have a significant impact on the desired output. Keeping them all could overfit the data and increase the com-
putational burden. Therefore, at the second step, the classical Gram-Schmidt algorithm (CGS) is used as a purifying scheme to retain the most contributive neurons and remove the least contributive ones. Namely, by projecting the output training data onto the neurons, we discover the projections may be dense on some neuron candidates and sparse on the others. Those neurons responding to a few or no data from the output space viewpoint are considered redundant and are removed. The neuron purifying step leads to a simpler structure of the WaveARX network without sacrificing model accuracy. Finally, we make the refined network even closer to the system model using a gradient search algorithm for optimization. Thus, the best possible approximation model can be derived. Due to space limitations, only the principles of this systematic design procedure are outlined. For more details, readers are referred to our previous study. 3. Adaptive WaveARX Network The adaptive WaveARX network is an enhancement of the off-line WaveARX network. It allows adaptive network architecture to be designed in real time. In the development of this adaptive algorithm, several conditions should be considered: (1) How to set up the initial model for on-line identification? (2) How to incorporate new data and remove old data? (3) When, where, and how a new neuron should be generated? When, where, and how an existent neuron should be removed? (4) How to update the network parameters when the new data are obtained? Any recursive method requires some initial model for a starting point. Due to the encouraging results of the WaveARX network applications, the initial model of the adaptive WaveARX network is derived from the off-line WaveARX network. That is, the network is trained by a small amount of the historical data. These training steps have been discussed in section 2.2. However, often using the gradient search for optimization is not necessary, because from most of our experiments, the network without it still satisfies the error criteria. By undertaking the first two design procedures based on a relatively small amount of historical data, an initial model is constructed and ready for on-line recursive applications. The following subsections address the above concerns, respectively, in terms of data processing, adaptive network structure, and parameter updates. 3.1. Data Processing. As the system evolves and new data are collected, it is natural to pay less attention to the old data and more attention to the new. In this research, this is accomplished by defining a moving rectangular data window. However, this is not the only choice. The window we select is just for easy explanation. This window moves to enclose the new data points and leave out an equal number of the most historical data points. It can be defined as
Yn ) Λ.*Yt,n (3)
Xn ) Λ.*Xt,n where Λ ) {δi}i)0,1,...,n and
δi )
{
1, 0,
iem i>m
}
(4)
+
+
2784 Ind. Eng. Chem. Res., Vol. 35, No. 8, 1996
where m is the size of the moving window which contains the most current data sets. Λ is the window function which is used to make certain that the data in the distant past (>m) are forgotten in order to keep track of the current system environment characteristics. Yt,n ) {y(t - i)}i)0,1,...,n, Xt,n ) {x(t - i)}i)0,1,...,n, and “.*” means element-by-element multiplication. The equations in (3) represent the historic trend, including the newest data (i ) 0) up to the oldest data (i ) n). 3.2. Adaptive Network Structure. Recently, the concept of a dynamic topology surged in the neural network literature. It suggests that the structure of the network should not be fixed. This proposition results from the fact that, if the characteristics of a process are changing with time, a structure size too small is unlikely to provide a good fit for the process, while a structure size too large wastes computational time, yields overfitting, and loses generalization ability. Therefore, a flexible network architecture is preferable so that its size can be adjusted to compensate for changes in the process behavior. Two common approaches in the literature have been used to determine the size of the network during training: constructive strategies and destructive strategies. A constructive strategy starts with a small network and adds neurons if necessary. The idea is attractive because it often is easy to train the parameters in these small networks. The cascade correlation architecture (Fahlman and Lebiere, 1991) falls into this category. It starts with only an input layer, an output layer, and the connections between them. Then hidden neurons are added sequentially to lower the residual error. The addition of neurons is stopped when the error is small enough. Once the hidden neurons are built in, they cannot be altered or eliminated. On the contrary, a destructive strategy starts with a network with superfluous parameters. Destructive strategy is the technique of removing connections between neurons in one layer and the next layer, which are superfluous for solving the problem. The neuron can be deleted from the network if all the connections (inputs or output) to/ from a neuron are removed. The neuron removing procedure is repeated until the error reaches the desired criteria. Optimal brain damage (Le Cun et al., 1990), optimal brain surgeon (Hassibi et al., 1993), and the strip net algorithm (Bhat and McAvoy, 1992) are examples of the destructive strategy. However, both strategies outlined above process data in batches. In addition, they either increase or decrease the structure size (number of neurons), respectively. They lack flexibility when encountering on-line nonlinear processes which require changeable networks with sometimes more and sometimes less neurons. Rather than using a trial-and-error constructive or destructive strategy, our adaptive WaveARX network constantly adds and removes neurons by locating and relocating them during on-line design procedures. This is done each time when a new window of data is defined. As stated in section 3.1, the sliding rectangular window continuously incorporates new data and removes old. At each window movement, the data distribution within the window is checked by MRA in order to evaluate whether and where new neurons should be generated and whether existing neurons should be annihilated. With the new neuron candidates selected, CGS is employed to purify those neurons. Consequently, it is straightforward to decide when and where neurons will
be added or eliminated. Thus, the architecture of the adaptive WaveARX network is dynamic. When a change in the number of neurons occurs, an adjustment in the network parameters is indicated as well. It is illustrated in the following section that the least-squares method is able to update the parameters with respect to the changes in the system being approximated. 3.3. Parameter Updates. Reviewing the structure of the WaveARX network as shown in eq 1, the network parameters can be classified into two categories: outside parameters and inside parameters. The inside parameters are the scale (sk) and translation (tkl) defining the wavelet function on each neuron. The initial values of these inside parameters are established when MRA is performed. The outside parameters are the weight coefficients wkl for the wavelet net and c and c0 for the ARX part of the network. Consequently, eq 1 is a set of linear equations allowing the use of a linear least-squares method. It has been emphasized that, for a fixed set of neurons with the inside parameters fixed, the outside parameters can be calculated using the least-squares method. To illustrate the recursive estimate of the outside parameters, let us define the WaveARX net in matrix form as
Yn ) ΦmΩm
(5)
where Φm ) {φj} is n × m dimensions, φj is a wavelet, a linear variable, or a constant corresponding to the weights wkl, c, and c0, and Ωm is an m-dimensional vector for these corresponding weights. Thus, the solution for the outside parameters is given by -1 Ω ˆ old m ) Fm gm
(6)
where Fm ) ΦTmΦm and gm ) ΦTmYn denote the elements of the matrix equation ∆
[Fm|gm] ) [ΦTmΦm|ΦTmYn]
(7)
Basically, the outside parameters should be updated according to the current neuron status and the arrival of the new data. 3.3.1. Recursive Parameter Evaluation for Architecture Changes. When the number of the neurons increases or decreases or there is a change in neuron location, the matrix equation (6) could be reformulated and repeatedly solved. This is computationally wasteful. A better approach is to utilize the computations already completed in the original network. To demonstrate how this is accomplished, first consider the case of generating a new neuron. With the addition of a new wavelet φm+1 in the network, the matrix given in eq 6 takes the form
[Fm+1|gm+1] )
[
Fm T fm+1
fm+1 νm+1
| |
gm γm+1
]
(8)
where Fm and gm are matrices whose element values are known from the previous calculation and where fm+1 is an m-dimensional vector; νm+1 and γm+1 are scalars whose values are computed using the (m + 1)th neuron φm+1. The estimate of the vector Ω ˆ m comes from the -1 least-squares method Ω ˆ m+1 ) Fm+1 gm+1. It can be obtained in terms of the known F-1 m and gm along with
+
+
Ind. Eng. Chem. Res., Vol. 35, No. 8, 1996 2785
the new vector fm+1 and new scalar νm+1 according to
Ω ˆ m+1 )
[ ] [
Fm Ω ˆ new m ) T w ˆ m+1 fm+1
fm+1 νm+1
][ ] -1
gm γm+1
(9)
T Let Dm+1 ) F-1 m fm+1 and βm+1 ) 1/(νm+1 - fm+1-1 Dm+1). Since Ω ˆ m+1 ) Fm gm+1, Ω ˆ m+1 can be determined by
(10)
where em+1 ) Yn - ΦmΩ ˆ old m can be interpreted as the residual vector of the original m-neuron model. If there are additional new neurons, they can be added into the network model one at a time using the above recursive procedures. When an existing neuron needs to be eliminated, it -1 is necessary to obtain the new inverse Fm-1 before proceeding to eliminate new weights. If the network has been retrained, it is, of course, necessary to con-1 struct and invert the Fm-1 once more. However, if the network is not retrained, a simple trick from the inverse of partitioned matrices (Horn, 1985) can be used for -1 approximating the inverse Fm-1 of the reduced matrix. We may get -1 -1 -1 -1 Fm-1 ) F-1 m (ξ,ξ) - Fm (ξ,ξ′) Fm (ξ′,ξ)/Fm (ξ′,ξ′) (11) -1 where F-1 m (R,β) is a submatrix of Fm that lies in the -1 row of Fm indexed by R and the column indexed by β. ξ′ indicates the deleting row (or column), ξ indicates the row (or column) without ξ′, and
-1 Ω ˆ m ) Fm-1 gm-1
(12)
3.3.2. Recursive Parameters Based on the Arrival of New Data. When there are no changes in the network architecture, it is still desirable to update the model using fresh data. Thus, a recursive algorithm can make use of the new information to improve the outside parameters. This topic is well studied in the adaptive control and system identification literature (Goodwin and Sin, 1984; Ljung, 1987). Technically, a set of the old data would be removed and a new set of data would come into the picture with the moving of the rectangular window at each sampling time. Due to the rectangular window, we focus on the most recent data and discard the old. The derivation of the following formulas is straightforward, so the results are given here without proof: To update the outside parameters when old data are trimmed, use
Ω ˆ m(n+1) ) Ω ˆ m(n) - Fd(n) Φm(n+1) ξ(n+1) e(n+1) ξd(n+1) )
1-
ΦTm(n+1)
ξa(n+1) )
1+
1 F(n) Φm(n+1)
ΦTm(n+1)
e(n+1) ) y(n+1) - Φm(n+1) Ω ˆ m(n)
(14)
Fa(n+1) ) Fa(n) - Fa(n) Φm(n+1) ξa(n+1) ΦTm(n+1) Fa(n)
T ˆ old Ω ˆ new m ) Ω m - βm+1Dm+1Φm+1em+1 T em+1 w ˆ m+1 ) -βm+1Φm+1
Ω ˆ m(n+1) ) Ω ˆ m(n) + Fa(n) Φm(n+1) ξ(n+1) e(n+1)
1 Fd(n) Φm(n+1) (13)
e(n+1) ) y(n+1) - Φm(n+1) Ω ˆ m(n) Fd(n+1) ) Fd(n) + Fd(n) Φm(n+1) ξd(n+1) ΦTm(n+1) Fd(n) To update the outside parameters when new data are incorporated, consider
where Fd and Fa are often called the covariance matrices for the trimmed old data and incorporated new data, respectively. 4. Simulation Studies In this section, two simulation cases are presented to evaluate the adaptive WaveARX network. Each case is chosen to demonstrate a specific ability of this online identification algorithm. In the first example, the adaptive WaveARX network is employed to predict a logistic map, a chaotic time series. Its ability to track the different chaotic behavior due to a time-variant change of a map parameter is shown. The second example illustrates how our proposed algorithm can identify the nonlinear dynamic system of a pH CSTR model on a real-time basis in comparison with the traditional recursive ARX model. In these two cases by employing only a few historical data points, the adaptive WaveARX network is able to adjust its structure and parameters based on the most current system information. The moving data window size is set to cover the most current 20 data pairs in example 1 and 40 data pairs in example 2. Note that the determination of the window size is based on the prediction quality. In fact, the bigger the window is, the more robust prediction ability the network can achieve. However, a bigger window usually increases the computation time; thus, a trade-off between the prediction quality and the computation time should be made. In the following simulation examples, the initial window size is arbitrarily chosen, and then a little tuning of the window is done based upon the required prediction error. The criterion, NSRMSE (normalized square root of mean square error), is used to determine whether to increase or decrease the number of neurons.
NSRMSE )
(
Es
P‚Var[yi]
)
1/2
(15)
P where Es ) ∑i)1 (f(x) - yi)2, P is the number of data points, and Var is a variance. Example 1. Predicting Chaos. Prediction of chaotic time series is a relatively new research topic. It is well-known that chaotic systems are characterized by features such as strange attractors and positive Lyapunov exponents, so the time series of sampled data from a deterministic chaotic system usually appears stochastic when analyzed with linear methods. Due to the extreme sensitivity of chaotic systems to uncertainties in the initial condition, only short-term prediction is possible even with perfect reconstruction of the dynamical equations in a noise-free condition. Here, the proposed adaptive network is used to show how it can identify the chaotic time series model. In this example, the first-order nonlinear logistic map is employed (May, 1976)
+
+
2786 Ind. Eng. Chem. Res., Vol. 35, No. 8, 1996
x(k+1) ) Rx(k) (1 - x(k))
(16)
The parameter R affects the degree of nonlinearity of the equation. This problem was investigated by several researchers using RBFN (Moody and Darken, 1989; Stokbro et al., 1990; Lowe and Webb, 1991) and by Bakshi and Stephanopoulous’ Wave-Net (Bakshi and Stephanopoulos, 1993). In these papers, the parameter R was always fixed. In this example, we would like not only to repeat the same problem with a fixed R but also to explore a time variant map by changing the value of R. Our goal here is to predict the future value of x(k+τ) based on the known values of the time series up to the point x(k), and τ is the time unit ahead. The standard method for this type of prediction is to create a mapping from E embedding the variable of the time series spaced ∆ apart. That is, we use (x(k), x(k-∆), ..., x(k-(E-1)∆)) to forecast x(k+τ). The values E ) 1, ∆ ) 1, and τ ) 1 are used here, so the output-input data pair is formated as
[x(k+1); x(k)]
{
4, 3.5 + 0.5e-0.01(k-89),
a
R
explanation
4.0∼ 3.570 ... 3.569 3.564 3.544 3.449
chaos reigns chaos begins ... period of 32 period of 16 period of 8 period quadruples
The logistic map: x(k+1) ) Rx(k) (1 - x(k)).
(17)
The logistic map with changing R is used to test the ability of the adaptive WaveARX network to track the system behavior as it varies with time. Note that the R value varies according to the following conditions,
R(k) )
Table 1. Major Bifurcation Points and Chaotic Boundaries for the Logistic Mapa
k < 90 k g 90
}
Figure 1. Predicted output (point) from adaptive WaveARX [actual output ) circle].
(18)
Under such circumstances, the system behavior changes from chaotic to periodic (Table 1). The result of predicting one step into the future (τ ) 1) using the adaptive WaveARX network is presented in Figure 1. The predicted outputs (plotted as solid points) closely follow the actual system behavior (plotted as small circles); their differences can only be seen in terms of the prediction error on a finer scale as in Figure 2. The number of neurons in the adaptive WaveARX network varies from 12 to 2 during the on-line operation. More specifically, the location of each neuron at a given k is illustrated as a black or darker block on scaletranslation planes (Figure 3). The distribution of the neurons at k ) 100 is depicted in Figure 3a. A total of eight neurons are qualified at this moment. With the moving window function at the sampling time k ) 200, there exist four neurons (Figure 3b). Likewise, only two neurons are needed at k ) 400. The reason for annihilating the neurons is that the process pattern gradually follows a more regular pattern (from chaotic to periodic). The scale-translation planes illustrate the evolution of the WaveARX net. This suggests that the number and location of neurons strongly depend on the changes in the process behavior. This can help us diagnose process problems, detect irregularities, or verify normal operation. This problem will be worthy of being addressed in our future research. Furthermore, we compare the prediction results between the adaptive WaveARX network and two traditional nonrecursive neural network methods, MLP and RBFN. From parts a and b of Figure 4, MLP and RBFN gradually lose control of tracing the actual value of R. In comparison with Figure 1, the adaptive WaveARX network, which captures the changing process behavior quite well, obviously outperforms these
Figure 2. Prediction error of the adaptive WaveARX.
nonrecursive models. This suggests that the adaptive feature of our proposed algorithm greatly improves identification of the time-variant chaotic process. Some might wonder that MLP and RBFN could have similar prediction results like the adaptive WaveARX network if these two models are implemented by the same adaptive technique. To answer this question, the basic differences of the activation function between MLP, RBFN, and WaveARX should be considered. It is well-known that neural networks have nonlinear black box structures for a dynamic system to describe any nonlinear dynamics. They all share a common framework
yˆ (x) ) f(φ(x),θ)
(19)
where f is a nonlinear function parametrized by θ and the components of φ(x) are regressors. This nonlinear structure indicates that regressors which are formed by input data can span a submanifold in the nonlinear space, and the function f to be approximated can be projected onto that submanifold. The prediction yˆ (x) usually forms an expansion of the simple basis regressor
+
+
Ind. Eng. Chem. Res., Vol. 35, No. 8, 1996 2787
Figure 4. (a) Predicted output (point) from MLP and (b) predicted output (point) from RBFN [actual output ) circle].
Figure 3. (a) Eight neurons at k ) 100, (b) four neurons at k ) 200, and (c) two neurons at k ) 400.
function, like the expansion of the activation function in MLP, of the radial basis function in RBFN, and of the wavelet function in WaveARX. However, the adaptive technique can hardly be imposed on MLP due to the globalized nature of the activation function. MLP, whose global function covers the whole input space, requires long time to modify all the parameters if any change in the process behavior occurs. The influence of the activation function over a large space also makes us hard to tell which neurons should be kept or annihilated. On the other hand, it may be easier to implement the adaptive technique on RBFN which contains local functions like WaveARX, but RBFN still suffers from long time search of locating the center and width of its activation function. Therefore, MLP and RBFN may not perform as well as the adaptive WaveARX when integrated with the adaptive features. Example 2. Modeling a pH CSTR System. This case is based on the simulation study of Bhat and McAvoy (1990), whose pH CSTR system neutralizes
acetic acid with sodium hydroxide. In order to generate the time-series data, a series of 1% uniform pseudorandom amplitude step changes is added to the system around the steady state NaOH flow rate of 515 L/min and pH ) 9 (Figure 5a). The sampling time is 0.4 min. Near this steady state, the characteristics of the reactor demonstrate considerable nonlinearity. The pH value changes sharply with concentration changes due to a high nonlinear gain. The goal of this example is to identify the dynamic system on-line using the adaptive WaveARX network in comparison with the recursive linear ARX model. Parts b and c of Figure 5 show that both algorithms seem to approximate the system pattern well. However, with examination of the prediction errors for the two models (Figure 6a,b), it is clear that the prediction error of the adaptive WaveARX net has less than that of the ARX model by 1/2. It is superior to the recursive linear ARX. This example has been used by Bhat and McAvoy in two publications. In their first study (Bhat and McAvoy, 1990), MLP with five hidden neurons was able to approximate the system well. Later in the second study, the stripped network (Bhat and McAvoy, 1992) removed two redundant neurons out of the original four neurons under their reactor steady-state operating conditions. Both networks were developed by batch training procedures. As opposed to their off-line applications, the adaptive WaveARX network with three to seven neurons is able to predict the reactor behavior on-line. In spite of the sharp and frequent changes in the NaOH flow rate, the adaptive WaveARX network performs well.
+
+
2788 Ind. Eng. Chem. Res., Vol. 35, No. 8, 1996
Figure 6. (a) Prediction error of adaptive WaveARX and (b) prediction error of recursive ARMA.
Figure 5. pH CSTR system: (a) base flow rate; (b) predicted output from adaptive WaveARX (dash-dot); (c) predicted output from recursive ARMA (dash-dot) [actual output ) line].
5. Conclusion This paper explores many aspects of the adaptive WaveARX network, such as the conceptual framework, architecture development, training procedures, and applications. It is an extension of our previous research on the WaveARX network. The preceding study shows that the WaveARX network is an off-line approach which can capture the linear and nonlinear contributions separately without prior knowledge of the system behavior. Its localization property yields fast convergence and time-saving training. Modifying this network to the new adaptive WaveARX network not only takes advantages of the off-line design procedures but also achieves real-time system identification. Inherent in the flexible structure of the WaveARX network, our proposed recursive algorithm is able to adjust the number and location of neurons, as well as the parameters, according to the changes in the monitored process.
With a moving rectangular window, the new data can be easily incorporated for on-line analysis. The data distribution within the window function is treated by MRA and CGS which constantly provides the latest information for network structure updating. Unlike a lot of research in the literature, the adaptive WaveARX network ensures a more compact form, while accomplishing the desired identification result. Some of the salient features of the adaptive WaveARX network were illustrated in the two simulation cases. Through on-line identification, we gain insight into the characteristics of the process behavior. This gives us a good starting point for the development of process diagnosis, operator knowledge extraction, or the modelbased control in our future research. As mentioned previously, the window size of the on-line adaptive WaveARX is chosen arbitrarily, followed by a little adjustment. The window selection method is not perfect, so this problem needs to be addressed in the future. Also, an alternative algorithm to the combination of the gradient search of the inside parameters and the least square of the outside parameters may be considered in follow-up studies. Nomenclature L2 ) 2-norm N ) number of wavelet functions S ) frame operator H f H X ) input pattern vector, ∈Rn Y ) output pattern vector, ∈RP Z ) integer a ) discrete scale step size b ) discrete translation step size c ) linear coefficients of WaveARX c0 ) constant of WaveARX
+
+
Ind. Eng. Chem. Res., Vol. 35, No. 8, 1996 2789 f ) function to be approximated sk ) discrete scale tkl ) discrete translation wkl, wj ) discrete wavelet coefficient Greek Symbols Ω ) parameter vector ψkl ) discrete wavelet function
Literature Cited Bakshi, B.; Stephanopoulos, G. Wave-net: A Multi-Resolution, Hierarchical Neural Network with Localized Learning. AIChE J. 1993, 39, 57. Bhat, N.; McAvoy, T. J. Use of Neural Nets for Dynamic Modeling And Control of Chemical Process systems. Comput. Chem. Eng. 1990, 14 (4/5), 573. Bhat, N.; McAvoy, T. J. Determining Model Structure for Neural Models by Network Stripping. Comput. Chem. Eng. 1992, 56 (2), 319. Bialasiewicz, J. T.; Ho, T. T. Neural Adaptive Identification And Control. Proceedings of the 1991 International Conference on Artificial Neural Networks in Engineering, St. Louis, MO, 1991; p 519. Chen, S.; Billings, S. A. Neural Networks for Nonlinear Dynamic System Modeling And Identification. Int. J. Control 1992, 56 (2), 319. Chen, S.; Billings, S. A.; Grant, P. M. Recursive Hybrid Algorithm for Non-linear System Identification Using Radial Basis Function Networks. Int. J. Control 1992, 55 (5), 1051. Chen, J.; Bruns, D. D. WaveARX Neural Network Development for System Identification Using a Systematic Design Synthesis. Ind. Eng. Chem. Res. 1995, 34 (12), 4420. Daubechies, I. Ten Lectures on Wavelets; Society for Industrial and Applied Mathematics: Philadelphia, 1992. Fahlman, S. E.; Lebiere, C. The Cascade-Correlation Learning Architecture. Technical Report CMU-CS-90-100; School of Computer Science, Carnegie Mellon University, 1991.
Goodwin, G. C.; Sin, K. S. Adaptive Filtering Prediction and Control; Prentice-Hall Inc.: Englewood Cliffs, NJ, 1984. Hassibi, B.; Stork, D. G.; Wolf, G. J. Optimal Brain Surgeon and General Network Pruning. Proceedings of the International Conference on Neural Networks, San Francisco, CA, 1993, I, 293. Horn, R. A. Matrix Analysis; Cambridge University Press: Cambridge, U.K., 1985. Le Cun, Y.; Denker, J. A.; Solla, S. A. Optimal Brain Damage. Advances in Neural Information Processing Systems; Morgan Kaufmann: San Mateo, CA, 1990; Vol. 2, p 598. Ljung, L. System Identification: Theory for the User; PrenticeHall Inc.: Englewood Cliffs, NJ, 1987. Lowe, D.; Webb, A. R. Time Series Predication by Adaptive Networks: a Dynamical Systems Perspective. IEE Proc.-F: Radar Signal Proc. 1991, 17. May, R. M. Simple mathematical model with very complicated dynamics. Nature 1976, 261, 459. Moody, J.; Darken, R. D. Fast Learning in Networks of LocallyTuned Processing Units. Neural Comput. 1989, 1, 281. Seidl, D. R.; Lorenz, R. D. A Least-Squares Derivation of Output Error Feedback for Direct State Estimate Correlation in Recurrent Neural Network. International Neural Network Society Annual Meeting, San Diego, CA, 1994; pp II-299. Stokbro, K.; Umberge, D. K.; Hertz, J. A. Exploiting Neurons with Localized Receptive Fields to Learn Chaos. Complex Syst. 1990, 4, 603.
Received for review January 2, 1996 Revised manuscript received May 24, 1996 Accepted May 28, 1996X IE960008A
X Abstract published in Advance ACS Abstracts, July 1, 1996.