Ind. Eng. Chem. Res. 1993,32, 1927-1936
1927
PROCESS DESIGN AND CONTROL Integration of Multilayer Perceptron Networks and Linear Dynamic Models: A Hammerstein Modeling Approach HongTe Sut Sensor and System Development Center, MN65-2300, Honeywell Inc., Minneapolis, Minnesota 55418
Thomas J. McAvoy'J Institute for Systems Research and Department of Chemical Engineering, University of Maryland, College Park, Maryland 20742
Recently, neural network dynamic modeling has drawn a great deal of attention not only from academia but also from industry. It has been shown that neural networks can learn to mimic the behavior of a complex chemical process. However, a neural network dynamic model might be difficult to develop when training data do not contain sufficient nonlinear information about the process. In addition, owing to the capacity of a modern control system, a great deal of steady-state data is stored. Most modeling techniques to date do not take into account the fact that the transient data may not contain sufficient nonlinear information and that the nonlinear steady-state data are abundant and easily accessible. In order to fully utilize the abundant steady-state information, this paper presents a neural network Hammerstein (NNH)modeling approach. In the proposed approach, a static neural network is integrated with a linear dynamic model in series. A constrained gradientbased algorithm is derived for training the linear dynamic model in order to retain the nonlinearity established by the static neural network. To demonstrate the proposed modeling approach, a complex polymerization process is used as an example. 1. Introduction
Recently, neural network dynamic modeling has drawn a great deal of attention not only from academia but also from industry. It has been shown that neural networks can learn to mimic the behavior of a complex chemical process (Bhat and McAvoy, 1990; Su et al., 1992). However, a dynamic neural network model might not be needed in cases where the transient data do not contain sufficientnonlinear informationabout the process. Given a set of transient data, thus, a nonlinearity test is recommended before training a neural network on the data. A simple yet useful nonlinearity test was proposed by Billings and co-workers (Billings and Voon, 1985). In general, the outcome of a nonlinearity test can lead to different cases as follows: 1. The transient data fail the nonlinearity test, and the steady-state data indicate a linear process. All linear systems belong to this category. 2. The transient data fail the nonlinearity test, but the steady-state data indicate a nonlinear process. A nonlinear process belongs to this category if its transient data are collected by means of mild perturbation around the normal operating condition. 3. The transient data pass the nonlinearity test, and the steady state data indicate a nonlinear process. In principle, most nonlinear systems belong to this category. 4. The transient data pass the nonlinearity test, but the steady-state data are linear. For example, the process yy+y=u (1) belongs to this category. +
E-mail:
[email protected]. E-mail:
[email protected].
If the transient data fail the nonlinearity test, training a neural network dynamic model on the data is not necessary. A linear dynamic model is sufficient if the transient data do not contain sufficient nonlinear information about the process. In the first case, a good linear dynamic model will be valid for the system regardless of operating conditions for the system. In the second case, the linear model will not be valid other than in the vicinity of the operating region where the transient data are collected; thus, a nonlinear dynamic model is needed. In this case, however, training a nonlinear dynamic model on the transient data does not make sense unless the nonlinear steady-state data are incorporated for training. This point will be discussed later. In the third case, anonlinear model is required. A nonlinear model can be a full nonlinear model (e.g., a neural network model) or a simplified nonlinear model (e.g., a Hammerstein model). Since training a full nonlinear model requires intensive computations, simplified nonlinear models can be used as alternatives. Simplified nonlinear models, such as Hammerstein models and Weiner models, are useful in this case. As will be discussed below, the simplified models are also useful for modeling nonlinear processes in the second case where the transient data do not contain sufficient nonlinear information about the process and nonlinear steady-state data are available. A schematic diagram which illustrates the situations where a simplified model can be used is shown in Figure 1. The fourth case, discussed above, is not considered here because a chemical process rarely belongs to this category. First, this paper presents the use of a Hammerstein model for modeling nonlinear processes when the transient data taken from the process contain insufficientnonlinear information. The Hammerstein model used here consists
0888-5885193/2632-1927$Q4.QQ/ 0 0 1993 American Chemical Society
1928 Ind. Eng. Chem. Res., Vol. 32, No. 9, 1993
Commonly parameteked
I
P
I
I
Separately parameterized
Figure 3. Example of the two parametrization approaches of a Hammerstein model for modeling a 2 X 1 system.
"
P
h
G (2)
p
could be used aa the nonlinear operator in a Hammerstein model. For a multiple input system, a neural network Hammerstein model is a commonly parametrized model. Like a conventionalcommonly parametrized Hammerstein model, training a neural network Hammerstein model using conventional parameter estimation approaches might require intensive computation. The discussionthat follows, however, focuses on the fact that the neural network Hammerstein model can make use of steady-state data in order to accelerate the training speed as well as to account for the shortage of nonlinear information contained in the transient data. 2.1. Use of Static Data. Modem process control systems usually store a great deal of data, which often contains steady-state information at many different operating conditions. One can consider using this abundant steady-state information for training a nonlinear model. However, in most of the nonlinear dynamic modeling techniques to date, abundant and easily accessiblesteadystate information is not considered. In most of the existing dynamicmodelingtechniques,onlytransient dataareused. The data are assumed to be generated by persistent excitation and to he well-distributedover the whole range of interest. In practice, however, an existing plant can only he mildly perturbed around normal operating conditions, which often results in the shortage of sufficient nonlinear information in the transient data. In this paper, a new methodology that uses steady-state data as well as transient data for training a Hammerstein model is proposed. Unlike the conventional Hammerstein modeling techniquewhichassumes the transient dataalone are sufficient,this approach insteadusesadditionalsteadystate data in order t o accelerate the training speed. This approach not only takes into account the fact that a chemical plant can only be mildly perturbed around a nominal operating condition but also intends to take full advantage of the ahundancy of steady-state information. Instead of using a polynomial, here, a neural network is used as the nonlinear static operator. Due to the nature of a Hammerstein model, in which the static nonlinear operator is separated from the dynamic operator, training a Hammerstein model can he much faster than requiring a full nonlinear neural network dynamic model. The fundamental idea of this approach is to train the nonlinear static operator and the linear dynamic operator of the Hammerstein model separately. The static neural network is used to learn the nonlinearity of the process, and the linear dynamic operator is used to learn the dynamics of the process. One can train these two components separately and sequentially on the steady-state data and transient data accordingly. In order to use this approach, however, one must assume that the available steady-state information covers a wider range of the operating conditions than the range in which the transient data are
Ind. Eng. Chem. Res., Vol. 32,No. 9,1993 1929 Layer 0
L- 1
1
L
.... .... X
.... ....
0 bias
weighted sum
0
fan-out unit
@
sigmoidal unit
@ output unit (can be sigmoidal or linear) Figure 4. Architecture of a multilayer Perceptron network.
generated, and the steady-state data can represent the static nonlinearity of the system sufficiently. The procedure of this new methodology is discussed below. First,the neural network is trained on the nonlinear steady-state data, where the output of the data is the target for training. After training, the neural network can then serve as a nonlinear operator (the details will be discussed later). Given a set of transient data, the input variables of the data are then transformed by the nonlinear operator into a set of intermediate variables. The linear dynamic model is then trained on the intermediate variables and the output variables of the transient data (i.e., the output is the target for training). Since the dynamic model is trained separately, one must pose a constraint on the dynamic model in order to retain the nonlinearity learned by the neural network. A detailed derivation of this approach is presented below. In addition, modification of this approach for a Weiner model is also discussed.
3. Neural Network Static Operator 3.1. Neural Networks. A general multilayer Perceptron network can be depicted as in Figure 4. Its mathematic form can be written as follows:
where
Such a multilayer Perceptron network has been proven to have universal function approximation properties with only one hidden layer of sigmoidal neurons (Cybenko,1989; Hornik et al., 1990;Girosi & Poggio, 1990). From here on, a feedforward network refers to a three-layer feedforward network. 3.2. "Localized"Training Algorithm. For a multilayer Perceptron network, a gradient-based training algorithm called the generalized delta rule (GDR) (Werboa, 1974;Rumelhart et al., 1986),which is also known as back-propagation,is most commonly used. Here, the GDR is expressed using the notation of ordered derivatives (Werbos, 1974; Su et al., 1992). Please refer to the Appendix for the definition of the ordered derivatives. By doing so, the training algorithm for the linear dynamic operator of a neural network Hammerstein (NNH) model can be easily derived. Provided that a set of steady-state input-output data (containing N samples) is given, a quadratic objective function
is usually formed as a criteria for network training. Using ordered derivatives, a localized gradient calculation of E with respect to the unknown weights W' and uI can be easily derived (Su et al., 1992). Each component of the gradient VwiE is
-=x-- a 0 e w a+E
N
awji
n-1
a+E
aS',(n) aw;,
(10)
where the "+" sign indicates ordered derivatives and the "on indicates conventional partial derivatives. The conventional partial derivatives are calculated as they are written without any substitution (Werbos, 1974;Werbos, 1989). Since the input to a neuron is simply a weighted s u m of the outputs from the preceding layer of the neuron, the conventional derivative becomes
Now define a delta (6) as the partial derivative of the objective function E with respect to the input of a neuron
(0:
One obtains a formula similar to the delta rule, which is derived for a linear Perceptron network (Widrowand Hoff, 1960), and u denotes the activation function of a neuron. In addition, an input neuron can be considered as one with a linear activation function, i.e., P(n) = P(n). The hidden layers consist of neurons with a sigmoidal activation function. The output neurons have either a sigmoidal or linear activation function. In this study, a linear activation function is used in the output layer, i.e., tL(n) = P(n). A sigmoidal activation function usually takes the following form:
With this formulation, the gradient with respect to the weights between the ( I - 1)th and the lth layers is determined by the product of the (I - 1)th layer's outputs and the lth deltas. Only local information is needed to calculate the gradient. Using ordered derivatives, one can further expand the delta as follows:
1930 Ind. Eng. Chem. Res., Vol. 32, No. 9, 1993
qn)= --=
where u [ [ e . ( n ) ] is the differential of the activation function. $or a sigmoidal neuron as given in eq 8, the differential of the activation function can be directly expressed by the neuron’s output:
,
I
Neural Network Hammerstein Model
1.-.-.-.............---------~-------------~-~-------------.~
sp: series-parallel method
so that the activation function does not have to be reexecuted for gradient calculation. The activation function of an input neuron and an output neuron is simply an identity function. Through the recursive calculation of the delta, eq 14 propagates the information of the objective function (E)backward from the ( I + 11th to the lth layer. 3.3. Nonlinear Static Operator. Given a set of steadystate data, one calculates the scaled input and output, u(n)and Y(n),from the raw data, &(n) and 3(n),so that
u(n)0 ‘p,2 = (ii(n)- a)
(18)
where eq 18is said to be an affine linear equation between
Y(n) and h(n). An affine linear equation can be transformed to a purely linear equation as follows. Without the loss of accuracy, one can rescale the output variable to be centered at ynew:
Ynew = 9 + ‘py2 Q w
follows:
h ( t ) = P ( u ( t ) )a Q(Vu(t)+ v) (22) Then, the next step is to develop a linear dynamic model between h ( t ) and y ( t ) . 4. Linear Dynamic Operator
where denotes an element by element product of two vectors. Letting x(n) = u(n),a three-layer feedforward network can be trained on the data. Assuming the difference of the predicted output and the data is small after training, one can get the following equations after the steady-state model is trained
Y(n) = W h ( n )+ w
p: parallel method
Figure 5. Linear dynamic operator in a Hammerstein model can be trained in series-parallel or parallel identification methods. z-*ia a backward shift operator.
(19)
4.1. Linear Dynamic Models. In general, an identification model, regardless of the form of ita mathematical representation, can be estimated by two different approaches: a series-parallel and a parallel identification method (Su et al., 1992). Figure 5 illustrates these two identification approaches. In the case where the switch “sp” is connected, the plant output is fed directly to the model for prediction during the training. This method is referred to as aseries-parallel identificationmethod since the model is not only in parallel with but also in series with the plant. On the contrary, when the switch “p” is on, the plant output is not fed to the model for predicting the plant output. In the latter case, the model is completely in parallel with the plant; thus, the method is referred to as a parallel identification method. The series parallel method is good for a one-step predictive model, whereas the parallel method is good for a long-term predictive model (Su et al., 1992). Let the input vector of the identification model x(t) be as follows (see Figure 5):
thus y(n) = Y(n) - w (20) so the constant term w in eq 18 vanishes. As a result, eq 18 becomes y(n) = WMn) (21) which constitutes a purely linear relationship between y(n) and h(n). Suppose a set of dynamic data is provided. One can then use eq 17 as the nonlinear operator, which transforms the input variables of the transient data into a set of intermediate variables. To do so, first, one scales the transient data in the same way that the steady-state data are scaled, except that the output is scaled to be centered at Ynew in eq 19. The inputs and outputs of the transient data are denoted by u ( t ) and y ( t ) , respectively. Notice that “(n)” denotes the steady-state data, whereas “(t)” denotes the transient data. Second, one converts the input u ( t ) into an intermediate variable h ( t ) using eq 17 as
- -
series-parallel
(23)
parallel
A linear dynamic model can be expressed as
W )= M W )
(24)
For convenience, the one-step predictive model (seriesparallel method) is rewritten as K”
Likewise, the long-ran e predictive model (parallel method) is rewritten as folfows:
Ind. Eng. Chem. Res., Vol. 32,No.9,1993 1931
In these representations, the coefficient matrix M in eq 24 becomes
4.2. Preservation of Static Nonlinearity. During the process of developing the linear dynamic model (see eq 241,the steady-state nonlinearity must remain the same. To achieve this goal, one can train the dynamic model subject to a constraint so that the nonlinearity learned by the steady-state model remains unchanged after the training. At steady state, the linear dynamic model shall have the following relationship between the input and output of the steady-state data:
(a) Series-Parallel Method
At)
(b) Parallel Method
Figure 6. Linear dynamic model can be considered as a two-layer linear network. In the series-parallelidentification method, the linear model ie equivalent to a linear feedforward network. In the parallel identification method, the linear model is equivalent to a linear recurrent network.
Substituting eq 21 into eq 28,one obtains the following linear relationship among the coefficients of the linear model:
(35)
If the following criterion is used T Nv 0
t=l j-1
the identification of the dynamic model becomes a constrainted optimization problem such that the equality constraint in eq 29 holds. In practice, one can convert an optimization problem with linear equality constraints to an unconstrained problem using substitutions (Wismer and R., 1978). Here, one rewrites eq 29 as follows:
The number of the model coefficients is reduced by Nh*Ny, which is the total number of elements of matrix BK,,.In this case, the derivatives required for gradient calculation need to be derived. 4.3. Localized Training Algorithm. The training algorithm discussed previously can be used to derive the training algorithm for the linear dynamic model. In fact, the linear model used in the series-parallel method is equivalent to a two-layer linear feedforward network, as illustrated in Figure 6a. This model is, in fact, the original Perceptron network presented by Widrow and Hoff in 1960(Widrow and Hoff, 1960),except that linear threshold neurons are used. In the parallel modeling approach, the linear dynamic model is equivalent to a two-layer external recurrent network, as shown in Figure 6b. Therefore, the generalized delta rule for training a neural network can be used for training the linear dynamic operator in a Hammerstein model. First, one defines the objective function as given in eq 30. Then, the gradient of E with respect to each of the weights (M) can be calculated at each neuron as follows:
For the one-step predictive model (eq 25),the deltas are
which is similar to the back-propagation through time algorithm (Suet al., 1992),except that d@(t)l=1 in this case. Here,
NY(7- t - 1) + j (36) denotes that the jth delayed network output is connected recurrently to the v ) t h input neuron of the network. Now, using the notation of ordered derivatives, the conventional derivatives of 9(t) with respect to each element a; (EA,) and b; (tB,) can be calculated as follows: j.4
and
respectively. For the long-term predictive model, the derivation is identical except that the conventional derivatives of 9(t)with respect to each element a; E A, are
(39) Once the gradient is obtained, one can use a gradient method or a conjugate gradient approach to train the linear dynamic model. 5. Modification for Weiner Models
The Hammersteinmodeling approachdiscussed is useful for systems with nonlinear gain and constant poles. In some cases where a nonlinear system cannot be well represented by a Hammerstein model, Haber and Unbehauen (1990)have reviewed several alternatives, including Weiner models and Hammerstein-Weiner models. Figure 7 illustrates the structures of a Weiner model and a
1932 Ind. Eng. Chem. Res., Vol. 32, No. 9, 1993
the nonlinearity remains unchanged. For a one-step-ahead predictive model K.
K,.
A Weiner Model
one formulates the following problem for training this model:
---
T
(44)
A Hammerstein-Weiner Model
Figure 7. Weiner model and a Hammerstein-Weiner model are both alternatives of a Hammerstein model.
Hammerstein-Weiner model. Unlike a Hammerstein model, a linear dynamic operator precedes a nonlinear static operator in a nonlinear static operator in a Weiner model. In a Weiner model, the nonlinear operator transforms the intermediate variables, which have a linear dynamic relationship with the input variables, to the output variables. A Hammerstein-Weiner model is a hybrid of a Weiner and a Hammerstein model, where a nonlinear static operator is placed between two separated linear dynamic operators. In case a Weiner model is more appropriate than a Hammerstein model for a specific nonlinear system, one can apply the same concept discussed for a Hammerstein model to train the Weiner model. Because the configuration of a Weiner model is different from that of a Hammerstein model, the approach developed for a Hammerstein model must be modified. The key modification is that the procedure for determining the intermediate variable is different. First, a neural network is trained on the steady-state data as in the Hammerstein modeling approach. The nonlinear static model after training can be expressed as follows: y(n) = P(u(n)) W(Vu(n)+ v) + w (40) Then, the next step is to find the intermediate variable, which has a linear dynamic relationship with the plant input variable. However, the determination of the intermediate variable is not so straightforward as for a Hammerstein model. The reason is given below. If the nonlinear operator P is invertible (i.e., P 1exists), one can find the intermediate variable, denoted as p ( t ) , from the given output, y ( t ) , by inverting the nonlinear operator so that p ( t ) = Pl(y(t)). In many cases, however, the nonlinear operator is not invertible. For example, the inverse of an operator with more independent variables than dependent variables is not unique (i.e., does not exist), even if the operator is linear. In the cases where P i s not invertible, the intermediate variable can be determined in an approximate fashion. Under the circumstance that the plant is perturbed in a small range around a normal operating condition, one can determine the intermediate variable as follows: P ( t ) a u ( t ) + X(t)V,P(U(t))
(41)
so that X ( t ) is determined for each sample, {u(t),y ( t ) ) , by a onedimensional line search in the gradient direction, V>(u(t)), which can be easily calculated using ordered derivatives discussed previously. After all p(t)'s are determined, one then trains a linear model to learn the dynamics between u ( t )and p ( t ) . Similarly, the steady-state nonlinearity must not change after training. Therefore, one has to derive a constraint for training the linear dynamic model so that
In order to retain the nonlinearity learned by the neural network, the followingrelationship between the input and output from the steady-state data must hold:
Since p ( n ) = u(n) (cf. eq 40 and eq 42), one obtains a constraint for the linear model as follows: KP
IN,, - C
Ku
Cr=
T=1
(46) T-1
which can be rewritten as
(47) Hence, eq 43 can be rewritten as:
K.-1 Tr1
Using the notation of ordered derivatives, the conventional derivatives of P ( t ) with respect to each element c; (EC,) and d; (ED,) can be calculated as
(49) and
respectively. In the case that a long-term predictive model (linear recurrent model) is used,
and the derivation is similar, except that the derivatives of P ( t ) with respect to each element c; (EC,) become
a0m) - t>(t--
7)- u(t - Kh) (52) ac; From the above discussion, the derivation of the Weiner modeling approach is similar to that of the Hammerstein approach, except that the determination of the intermediate variable is different and the static nonlinearity might not be invertible.
6. Case Study: A Polymer Reactor
In order to demonstrate the modeling techniques developed in this paper, a polymerization process is
Ind. Eng. Chem. Res., Vol. 32, No. 9, 1993 1933 Mixture of Monomers
~
,
V U
1
Y2L Polymer
Figure 8. Continuous copolymerization process (simplified).
P & 2
a
t
-E a
Residence time
Monomer ratio
Cat. injection rate
Temperature setpoint
€
8
e
a
t
-0E n
Figure 9. The process ia nearly linear in the vicinity of the normal operating conditions. The normal operatingconditionsare indicated by the square box in each figure.
studied. A complex simulation program developed by Kim and Choi (1991) is modified and used as the plant. The process parameters of the simulation program do not change over time, and the process does not have any multiple steady state in all the operating regions under consideration. In order to account for the difficulty of data collection in actual operation, the plant is perturbed in a small region around a predefined normal operating condition. A neural network Hammerstein model is assumed to be appropriate for modeling the process. The simulation process is a continuous copolymerization process. Figure 8 shows a simplified schematic diagram of the process. Two streams are fed to the reactor: one carries a mixture of two monomers as reactants and the other carries the catalyst. The process also consists of a control loop, where the catalyst flow is used to control the reactor temperature (TI)by a PI (proportionaland integral) controller. An on-line measurement of the polymer property is located at the outlet stream. Here, the polymer property is assumed to be the only product quality indicator. The desired polymer property (denoted as y ) is maintained by manipulating the feed flow rate of the mixture (ul),the ratio of the two monomers in the mixture (uz),and reactor temperature set point (us), which is
controlled by the PI controller. As a result, the simplified process is a 3 X 1 system. A first principles model developed by Kim and Choi (1991) combined with a PI controller is simulated as the plant. Kim & Choi's model is a very complex model, which contains more than 40 highly coupled nonlinear differential equations describing the fundamental polymerization mechanism. The PI controller, which is properly tuned a priori, is implemented as indicated in Figure 8. If sufficient steady-state are given, one can investigate the steady-state behavior of the plant. The polymer property is plotted in Figure 9 against each input variable with the other two inputs kept constant at a normal operating condition. Notice that all input variables are scaled from 0 to 1, and 0.5 is the normal operating condition. One can easily see from Figure 9 that the polymer property has an almost linear relationship with each of the three input variables in the vicinity of the normal operation condition. The normal operating condition is located at the center of each axis in all figures. However, as shown by the solid lines in Figure 10 (the dashed lines, which are very close to the solid lines and essentially invisible, will be discussed later) where polymer property is plotted against each pair of the three input variables in three-dimensional plots, significant nonlinearity does exist if the plant is operated far away the normal operating condition. Further, a set of transient data is collected from a plant test. The plant test is carried out by perturbing each input using a pseudorandomsignal around the normal operating condition. The random signal has a zero mean with maximum amplitude 0.2 (20% of scaled value), so each input variable ranges from 0.3 to 0.7. The PI controller, which is properly tuned, is on closed-loop control during the plant test. The plant output and each of the inputs are plotted in Figure 11. The transient data contain essentially linear information. To verify this point, one can run.the nonlinearity test discussed previously on the transient data. As shown in Figure 12,the three nonlinear correlation functions between the output and each of the three inputs are nearly zero for all lags, while the linear correlation functions are nonzero. Therefore, a linear dynamic model should be sufficient to fit the data. A linear dynamic model can be expressed as follows: K" r=l
Figure 13a shows the results of a third order (Ky= K,,= 3)linear dynamic model on the testing data after training. Since the linear model has a gain matrix K"
K"
(54)
which is constant after training, the model is valid only in the vicinity of the normal operating condition. Since both transient data and steady-state data are given, one can use the neural network Hammerstein model discussed above to model the process. First of all, a neural network with three input units and one output unit is used to learn the steady-state data using the static modeling approach. The number of hidden units is five in this study case. After training, the result of the neural network is shown in Figure 10 by the dashed lines, which are very close to the solid lines. The neural network is almost perfect for learning the nonlinear functionality of the process at steady state.
1934 Ind. Eng. Chem. Res., Vol. 32, No. 9, 1993 (a) Plant output
e
200
0
600
800
600
800
400 Time
r
1
0
200
400 Time
(a) Polymer property as a funcoon of UI and UZ.
(c) Plant input 2
L
I
I
I
0
200
400
600
Time
800
(d) Plant input 3
I
1
i
0
200
400
600
Time
800
Figure 11. Input and output data from a plant test around the normal operating condition. (b) Polymer property as a function of U I and u)
6
(a) linear correlation functions
~4
1.O
0.5 .........,.
2 00
3
V'
-_
......................
- - , - , ~ , , ~ ..-.. ~-.. -.-
-0.5
L 10-
-Y-u Y-k
.-
-3g
0.0
B
-0.51
............... ----
. (
......................... -_
(c) Polymer property as a function of uz and ul
Figure 10. Significant nonlinearity does exist in conditions other than normal condition.
If the Hammerstein modeling approach is chosen, one first converts the input variables, u(t),into the intermediate variable, h(t),using eq 22. Then one uses eq 26 to learn the dynamics between h(t)and y ( t ) . A third-order linear dynamic model is used here. As shown in Figure 13b,the neural network Hammerstein model after training is able to predict the dynamic behavior of the process very well. The results shown in Figure 13 are for long-term predictions. The root mean square (RMS) prediction errors of the two models are given in Table I. The Hammerstein model is superior to a linear model in predicting the testing set. In order to test the validity of the neural network Hammerstein model, another set of transient data is generated when the plant is operated in another operating
0
5
10
15
20
25
30
35
40
Lag
Figure 12. Nonlinearity teet using linear and nonlinear correlation functions.
condition other than that where the model is trained. Here, each variable ranges from 0.1 to 0.5. The Hammerstein model, which was trained on the nearly linear transient data (together with the nonlinear steady-state data), is used for prediction. As shown in Figure 14, the Hammerstein model performs very well on this new data. Using the steady-state data, the Hammerstein modeling technique leads to a nonlinear model even though the transient data do not contain nonlinear information. As shown in Figure 14, the Hammerstein model is superior to the linear model in predicting the polymer property in regions other
Ind. Eng. Chem. Res., Vol. 32, No. 9, 1993 1935 (a)Linear dynamic model
-
650
600
700
750
600
650
900
Time
(b) Neural
network Hammerstein model
-Plant
nl
nonlinear dynamic process, By integration of a static neural network with a linear dynamic model, the task for modeling a nonlinear dynamic process becomes a task for training a nonlinear static model and a task for training a linear dynamic model. If one is training the nonlinear static model, abundant and easily accessible steady-state information can be fully utilized. In practice, this approach is useful especially when the transient data of a nonlinear process contain only linear information.
Nomenclature
All vectors are column vectors. A lower case variable with subscripts i, j , and/or k stands for an element of a vector or a matrix.
A,B, C,D: coefficient matrices for a linear dynamic model 600
650
700
750
800
850
900
Time
Figure 13. Results on the testing data of a linear model and a Hammerstein model trained on the transient data. Notice that the plot is in a semilog scale. Table 1. RMS Prediction Error of the Hammerstein Model and the Linear Model (for Long-Term Predictions). model training testing Hammerstein 0.538 0.548 0.610 0.732 linear 0
The RMS errors are calculated on the engineering Units.
(a)Linear dyanmic model
U
L-..................... 0
50
100
1 1
.
150
I
”
200
250
300
Time
(a)Neural Network Hammerstein model
I
0
I
50
100
150
200
250
300
Time
Figure 14. The resulting Hammerstein model is used in other operating region. Notice that the plot is in a semilog scale.
than the normal operating region. The RMS prediction errors for the Hammerstein model and the linear model are 0.16 and 0.27 (engineering units), respectively. Notice that because of the logarithmic scale these prediction errors appear smaller than those produced around the original operating region. 7. Summary From the above example, the modeling approach developed here is useful for modeling a nonlinear process, even if nonlinear information is not contained in the transient data. As shown in this polymerization process, using the steady-state information, the Hammerstein model is able to mimic the dynamic behavior of the nonlinear process even though the transient data only contain linear information. The resulting model can be used not only in the vicinity of the normal operating conditions but also in other operating regions. The neural network Hammerstein modeling approach developed in this paper simplifies the task of modeling a
E: error function for training a (network or linear) model, a scalar G(z): arbitrary transfer function h(n): output of the hidden layer of a three-layer network for the steady-state data, ERNh h(t):output of the hidden layer of a three-layer network for the transient data, ESNh I,: (n x n)-dimensional identity matrix i, j , k: subscripts indicatingthe indices of elementsin a vector or a matrix IC: gain matrix of a linear dynamic model Ky:order of the autoregressive terms of a dynamic model KO:order of the exogenous input terms of a dynamic model 1: subscripts of superscripts indicating a layer of a multilayer Perceptron network M: coefficient matrix of the linear dynamic models N: total number of samples in the steady-state data set Nh: number of neurons of the hidden layer Nu: dimension of input variable (or manipulated variable) Nz:number of input neurons of a network, N, = NyKy + N&U Ny: dimension of output variable (or controlled variable), or the number of the output layer of a network I? arbitrary static nonlinear operator,Ny-dimensionalvectorvalued, Pi is an element of P p: intermediate variable of the Weiner model b: predicted intermediate variable of the Weiner model T: total number of samples in the transient data set u(n): scaled input variable of the steady-state data, ERN. u(t): input variable of the transient data, ERN. ii: unscaled input variable of the steady-state data, ERN. V weight matrix between the input and the hidden layers of a three-layer network, ESNhXN* W weight matrix between hidden and the output layers of a three-layer network, ERNrxNh x: input to the neural network, EaN* y(n): scaled output variable of the steady-state plant data,
ERNY output variable of the transient data, €9” p ( t ) : predicted output from the linear dynamic model, ERNr 9: scaled output variable of the steady-state data, ERNY 4: unscaled output variable of the steady-state data, ERNY 9: sample mean of the output variable €96’” Symbols a,6: constants of a sigmoidal function 6‘: the ‘delta” of the Ith layer of a neural network, ERNi (pu2: sample variance for the unscaled input variable, EftN. (py2: sample variance for the unscaled output variable, EWNr fl’(n: A(u(lli), ‘J(t2)t ..., u(CNh))T, E%Nh u: univariate scalar function denoting the activation function of a neuron d: the differential of u o: bias weight of the output layer of a three-layer neural network, ESNY y(t):
1936 Ind. Eng. Chem. Res., Vol. 32, No. 9, 1993 v: bias weight of the hidden layer of a three-layer neural
network, ERNh s': input to the hidden layer of a three-layer neural network, ES l N l T transpose of a matrix V: gradient operator Appendix: Ordered Derivatives Consider a general function as follows:
E = E(+ x2, ...,xn) xj
= X j ( X l , x2, ...,
(55)
j = 2, 3, ...,n
To calculate the partial derivative of the function E with respect to one particular variable, say x j , Werbos (1974) has shown that the derivative can be expressed by the ordered derivatives as follows: (56)
Although deriving a partial derivative is a fundamental topic in calculus, it becomes tedious when the function is implicitly expressed by many functions composed of a subset of the independent variables, and thereby extra intermediate variables are introduced. Therefore, the ordered derivatives offers a convenient method to deal with this situation. Literature Cited Bhat, N.; McAvoy, T. J. Use of neural nets for dynamic modeling and control of chemical process systems. Comput. Chem. Eng. 1990, 14 (5),573-583. Billings, S.A. Identification of nonlinear systems: A survey. ZEE Proceedings, Part D 1980,127,272. Billings, S.A,; Fakhouri, S. Y.Identification of a class of nonlinear systems using correlation analysis. ZEE Proceedings, Part D 1978, 125,691-697. Billings, S.A.; Voon, W. S. F. Correlation based model validity tests for non-linear models. International Journal of Control 1985,44 (l), 235-244.
Chang, F. H.; Luus, R. A noniterative method for identification using the Hammerstein model. IEEE Transactions on Automatic Control 1971,AC-16, 464. Cybenko, G. Approximation by superpositions of a single function. Mathematics of Controls, Signals, & System 1989,2,303-314. Eskinat, E.; Johnson, S. H.; Luyben, W. L. Use of Hammerstein models in identification of nonlinear systems. AZChE J. 1991,37, (21,255-268. Gallman, P. A comparison of two Hammerstein model identification algorithms. ZEEE Transactions on Automatic Control 1975,AC20,771. Girosi, F.;Poggio, T. Networks and the best approximation property. Biological Cybernetics 1990,63, 169-176. Haber, R.; Unbehauen, H. Structure identification of nonlinear dynamic systems: A survey on inputloutput approaches. Automatic~1990,26 (4),651-677. Hornik, K.; Stinchcombe, M.; White, H. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks 1990,3 (5), 551-560. Kim, K. J.; Choi, K. Y. Continuous Olefin copolymerization with soluble Ziegler-Natta catalyst. AZChE J. 1991,37(8),1265-1260. Ljung, L. System Identification: Theory for the User; PrenticeHall: Englewood Cliffs, NJ, 1987. Narendra, K. S.; Gallman, P. G. An iterative method for the identification of nonlinear systems using a Hammerstein model. ZEEE Transactions on Automatic Control 1966,AC-11 (6),5 4 6 550. Narendra, K. S.; Parthasarathy, K. Identification and control of dynamic system using neural networks. ZEEE Transactions on Neural Networks 1990,l (l), 4-27. Rumelhart, D.; Hinton, G.; Williams,R. Chapter 8 Error propagation and feedforward networks. In Rumelhart, McClelland, Eds.; Parallel Distributed Processing; MIT Press, 1986;Vola. I, 11. Su, H. T.; McAvoy, T. J.; Werboe, P. J. Long-term predictions of chemical processes using recurrent neural networks: A parallel training approach. Znd. Eng. Chem. Res. 1992,31, 1338-1352. Werbos, P. J. Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. Thesis, Harvard University, 1974. Werbos, P. J. Maximizing long-term gas industry profits in two minutes in lotus using neural network methods. IEEE Transactions on System, Man, & Cybernetics 1989,19 (2),315-333. Widrow, B.; Hoff, M. E. Adaptive switching circuits. WESCON Conuention Record 1960,ZV, 96-104. Wismer, D.; R. C. Introduction to Nonlinear Optimization: A Problem Soluing Approach; North-Holland: Amsterdam, 1978. Received for review March 15, 1993 Revised manuscript received June 7, 1993 Accepted June 15, 1993.
* Abstract published in Advance ACS Abstracts, August 15, 1993.