nonlinear estimation theory - ACS Publications

in Ordinary Differential Equations. To avoid confusion, it is important to settle on some terminology at the outset. Nonlinear Estimation Theory is me...
0 downloads 0 Views 1MB Size
JOHNH. SEINFELD

NonIinear t stI mati o n Theory Applications include estimation of kin et ic param et ers from experimental reactor data

problem of fundamental importance to the chemical

A engineer is the analysis of system or plant output

data for the purpose of process modeling. Given output operating data, it is desired to construct an adequate mathematical model of the system, often in the form of estimating parameters in the process model. T h e general problem of the estimation of process states and parameters from actual output data is considered for static and dynamic systems. Applications include the estimation of kinetic parameters from experimental reactor data and the estimation of states and parameters for optimal and adaptive control. Estimation theory and its applications are of fundamental interest to the chemical engineer. Whenever a correlation between theory and experiment is tested, techniques for the extraction of useful and meaningful information from experimental data are required. The following broad classes of problems necessitate estimation techniques :

1. The determination of model parameters in nonlinear algebraic models from experimental measurements, e.g., the determination of chemical reaction rate constants from experimental reaction rate measurements. 2. The determination of model states and parameters in nonlinear dynamic models from laboratory and plant output data, e.g., the modeling and simulation of nonlinear dynamic processes from operating data. 3. The on-line modeling of dynamic processes, where a continuous output signal is used to generate instantaneous estimates of the states and parameters in process models. This is important, for example, in the adaptive control of dynamic systems. The application of estimation theory to these classes of problems is receiving much current interest. I n this paper is a unified presentation of what might be called nonlinear estimation theory. I t is hoped to provide the reader with the fundamental concepts employed in estimation theory and to summarize the important 32

INDUSTRIAL A N D E N G I N E E R I N G CHEMISTRY

literature in the area. follows :

T h e paper will be divided as

I.

The Estimation of Parameters in Nonlinear Algebraic Models 11. Estimation of States and Parameters for Nonlinear Dynamic Systems 11.1 Estimation of Parameters from Discrete Noisy Measurements 11.2 Estimation of States and Parameters from Continuous Noisy Measurements 11.2.1 No Dynamical Noise 11.2.2 Additive Dynamical Noise 11.2.3 Sequential Solutions 111. The Accuracy of State and Parameter Estimates , in Ordinary Differential Equations

t

T o avoid confusion, it is important to settle on some terminology a t the outset. Nonlinear Estimation Theory is meant to include those problems in which one has: (1) measurements of a process which may contain errors; and (2) a nonlinear model for the process outputs in terms of the process states, and one desires to estimate some unknown features of the model, using the process output measurements. We assume that the form of the system model is specified a priori, perhaps containing unknown parameters which are to be subsequently estimated. Let us be somewhat more specific, with reference to the organization of the paper. T h e customary terminology for the material in Part I is Nonlinear Parameter Estimation. This is perhaps too inclusive, so we have chosen the title of Part I as above. As we will see, nonlinear parameter estimation can be used to describe the problems of I and 11.1, and this phrase has been retained for the heading of references A which pertain to I and 11.1. I n Part I, we assume that the vector of measurements y is related to the vectors of process states x and unknown parameters k by an algebraic relationship y = f(x,k). I n this class of problems the states of the process are usually known a t each experimental point r , and it is desired to estimate k by matching y r and f(x,,k) in some optimal fashion over the range of experimental points. The material of Part I largely concerns how this matching is carried out. For example, in the experimental determination of reaction rate constants, we may measure the concentrations x and the rate of reaction y in a batch reactor directly so that the process of estimating rate constants k consists of fitting the algebraic rate expressions f(x,,k) to the rate measurements y r . The estimation of parameters in nonlinear algebraic models has received a wealth of study; e.g., see the text of Deutsch ( 3 C ) . I t would be inappropriate to attempt a lengthy review of this area. Thus, the material in Part I is included for its background value and the perspective it provides for the newer material in Part 11. The material in Part I1 represents our main interest, since this material has not appeared previously in a unified form in the literature. I n Part I the process states are known a t each experimental point and parameters in the algebraic process model are to be estimated.

I n Part I1 we deal with situations in which the process states are governed by ordinary differential equations. For example, in our experiment to determine reaction rate constants from batch reactor data, if we are only able to measure the concentrations a t each time, the estimation of rate constants consists of fitting the solutions of the differential rate equations

dx - = f(x,k) dt

+

to the measurements, y. = x7 (experimental errors), a n inherently more difficult task. This problem is treated in 11.1, in which we assume that the measurements have been made at a finite number of points. I n 11.2 we consider the same problem but with measurements made continuously. Two cases are distinguished: (1) errors occurring only in the measurements, and (2) errors occurring in measurements and in inputs to the dynamical process. Measurement errors occur because of inaccuracies in instruments, analysis techniques, etc., and are reflected in the algebraic relationship between the measurements and the process states. I n addition to measurement errors, inputs to the process may contain random fluctuations; e.g., variations in the rate of heating of our batch reactor, which are reflected by stochastic input terms in the differential equations for the states. Two types of solutions to the estimation problem of 11.2 are considerednonsequential and sequential. If we gather all the data before generating estimates, this is the nonsequential approach. If estimates are produced as the data are being gathered, this is the sequential approach. Both types of solutions to the estimation problem are fully discussed in 11.2. I n Part 111 the analysis of the accuracy of state and parameter estimates in nonlinear dynamical models from Part I1 is outlined. Again because of its ready availability (3C), the corresponding analysis for Part I is omitted. Results on parameter estimation have been kept brief purposely because of their availability in the outstanding text of Rosenbrock and Storey (7C). Efforts are made throughout the paper to cite significant theoretical contributions as well as interesting and important practical applications. Undoubtedly, some pertinent references will have been overlooked. I n no way, however, is such a n omission to be construed as an implication of the importance of the work. Estimation of Parameters in Nonlinear Algebraic Models

The first group of problems which we consider are those in which one has a nonlinear algebraic model relating the output of a system to states and parameters of the system. Given output measurements corrupted with errors, it is desired to estimate the unknown parameters in the system model. This class of problems, usually termed nonlinear parameter estimation problems, has received a wealth of study and is important in practically all areas of science and engineering. I n this part we will outline the nature of the problem and VOL. 6 2 NO.

1

JANUARY 1 9 7 0

33

several common solutions. For the most part we will limit literature references cited to applications of nonlinear parameter estimation to chemical engineering problems. LVe consider a process in which an m-vector of dependent variables (outputs) y is related to an n-vector of independent variables x (states) and a p-\.ector of unknown parameters k, according to

R experiments are carried out in which, for a known value of x, y is measured. The measurements of y will contain random experimental errors, so that at each of the R data points

yr

=

+

f(x,,k) (experimental errors) r = 1, 2, . . ., R

(1.2)

It is desired to find the value of the paraniercr \rector k so that the predictions of the model match the experimental data in some best sense. This desire can br formalized by choosing k to maximize an objective function. The deviation between prediction and data is gi\Ten by the residuals

The moment matrix of residuals is K

M(k)

e,(k)e,(k)'

=

where p i is a scalar, R iis a matrix, and [dS/bk),;,,is the gradient vector of S at k = k ( ' ) . Gradient inethods are distinguished in the choice of p , and R i . I n general : 1. R f should be positive-definite and should be an approxiination to -Hi-', the Hessian matrix of S(d2Sl'dk,dk,) at k(i). 2. p 1 should bc chosen so that S[k(l+"]> S[kci,1.

A cornprehensive and excellent review of gradicnt methods for the solution of rionlinear parameter estiination problenis has been presented by Bard ( 7 A ) . For this reason we will only list the inethods Bard has treated with appropriate references. An extensi1.e collection of computational results of the different methods on test problems is included in Bard's report. 1. The Gauss Method. f(x,,k) is expanded in a two-term Taylor series about a current value of k, kci). The correction kCi+l) - k ( * ) is determined from the solution of the resulting linear least squares prohleni ( 5 4 734 7 4 4 . 2. The Marquardt Method. An optiiiiunii interpolation between the Gauss method and steepest-desccnt methods is made, the interpolation being based on thc maxiinuin neighborhood in which the truncated Taylor gives an adequate representation of f(x,,k) ( 1 9 A ) . 3. Variable Metric Methods. From a Taylor serics,

r=l

The general objective function to be maximized has the forni S(k) = *[M(k)I (1.5) The following common objective functions eniplo\ the definitions of \k(IA) : 1. Weighted least squares \k = -Tr[QM(lr)]

(1.6)

2. Maximum likelihood (the errors i n y r are assumed to be normally distributed \vir11 zero mean and kno~711 cox ariaiice matrix V) \k = -1/2Tr[V-'M(k)]

3.

(1.7)

Maximum likelihood (V is unknown)

R 2

\k = -- log det M(k)

(1.8)

The parameters may, in addition, be subject to inequality constraints

rC/,(k) 5 0

t =

1,2, . . ., q

(1.9)

which reflect physical constraints that the value of k must satisfy. For example, if k is a vector of chemical reaction rate constants, we may require that each k , be positive. Most of the algorithms for the determination of k are based on a general gradient method. Given a value of k a t the ith iteration, kc'), the next estimate, k(%+'), is generated by (1.10)

34

I N D U S T R I A L A N D E N G I N E E R I N G CHEMISTRY

(1.11)

If we have a matrix A , and we wish to add a matrix B i so that A { L = ~ A i Bi is an approxiination to ?H-', then Equation 1. l l becomes

+

B r w i = [k(i+') - k ( i ) ] - A iw i

(1.12)

These methods differ in how B i is chosen to satisfy Equation 1.12. a. Rank One Correction. B i is chosen as a syirimetric matrix of rank one ( 8 A ) . h. Davidon-Fletcher-PoTvell Method. B i is choscri to be of rank two (IOA, '2'4). c. Inv,rse Rank One Correction. A i + l is taken as an approxiination to H, and then the same as rank one correction ( I A ) .

In his study on five test problems Bard found the efficiency of these algorithms to be : The Gauss iiiethod (most efficient) The Marquardt method Variable metric rank one methods Davidon-Fletcher-Powell (least efficient). There have been a number of important applications of nonlinear parameter estimation in chemical engineering. One notable area of application has been the

John H. Seinfeld is an Assistant Professor in the Chemical Engineering Laboratory at the California Institute of Technology, Pasadena, Calij. 97709

AUTHOR

specification of experimental schemes to obtain precise parameter estimates in kinetic experiments. Confidence regions on parameter estimates obtained from nonlinear least square analysis of experimental rate data have been used as a basis for distinguishing among rival reaction rate models by several authors. Kittrell and Mezaki (76A, 7 7 4 2OA) have studied extensively the nonlinear least squares analysis of reaction rate models, in which they have considered the question of choosing data points so that precise estimates of the parameters in nonlinear reaction rate models can be obtained. Detailed studies of catalytic rate models using nonlinear parameter estimation have also been carried out by Peterson and Lapidus (78A, 22A, 23A). Nonlinear parameter estimation has been used in studying homogeneous noncatalytic reactions by Box (6A, 7 A ) and Cull and Brenner (9A) and for heterogeneous catalytic reactions by Blakemore and Hoerl ( 4 A ) . There now exist standard computer programs for the estimation of parameters in nonlinear algebraic models, e.g., Booth and Peterson ( 5 A ) , based on the Gauss method, so that the real significance of these algorithms is their utility in model-building and experimental design. II. Estimation of States and Parameters for Nonlinear Dynamic Systems

We now proceed to the second general class of nonlinear estimation problems, namely the estimation of states and parameters in nonlinear dynamic models (described by a set of ordinary differential equations). A basic assumption is that a priori knowledge is available in the form of the ordinary differential equations which describe the behavior of the state of the system and on the form of the algebraic relationship between the measurements and the states. The material in this section is applicable to the wide class of practical situations in which one desires to construct a mathematical model for the dynamics of a plant or system based on input-output operating data. This part is divided into two basic sections: (1) The estimation of parameters in a nonlinear ordinary differential equation model from noisy output data taken a t discrete times. The dynamical inodel is assumed to be free of random inputs. (2) The estimation of states and parameters from noisy output data taken continuously. T h e first section is basically an extension of Part 1 to the case in which the state variables are governed by differential equations. T h e second section includes the cases in which the system is free of or contains additive random inputs. Some comments are necessary concerning the approach to the subject of the estimation of states and parameters for nonlinear dynamic systems from continuous noisy output measurements. As we have noted, we can attempt to solve the estimation problem nonsequentially or sequentially. Of course, the result will depend on the estimation criterion employed. For consistency we have employed a least square criterion throughout the entire development. I n 11.2.1 and 11.2.2 the nonsequential solutions of the cases of no

dynamical crrors and additive dynamical errors are treated. A sequential solution of each problem is considered in 11.2.3. The sequential solution of a nonlinear estimation problem is usually termed a nonlinear filter, the state estimate a t the last time of observation is called the filtering estimate, and if such an estimate is produced continuously as observations are made, we are in essence continuously filtering the system. The designation for this type of solution is nonlinear filtering or nonlinear sequential estimation. Thus, the filtering problem is concerned with estimating the current state of a dynamical system based on all past and present measurements of some random function of the state. The first significant contributions in the area of filtering are those of Kalman and Bucy (77B, 78B),in which the optimal minimum variance estimate of the state variables of a linear dynamic system with white noise disturbances was found as the solution of an initial value problem of ordinary differential equations, the so-called Kalman filter. The problem of nonlinear filtering has been studied since about 1962. There are essentially two approaches to this problem. The first approach is basically an extension of the work of Kalman and Bucy, in which variational methods or optimal control theory are used to obtain solutions to the estimation problem. Different criteria, usually minimal-variance or least squares, have been used to derive approximate nonlinear filters. Most of these techniques require a t some point a Taylor series expansion neglecting second and higher order terms. Authors who have used this approach include: Cox ( 7 B ) , who used a dynamic programming formulation ; Detchinendy and Sridhar (9B) and Bellman et al. (3B), who used variational methods to derive a two-point boundary-value problem which is solved by invariant imbedding: Friedland and Bernstein (7OB); Mowery (25B); Ho and Lee (75B); Bryson and Frazier ( 4 B ) ; and Larsen et al. (23B). The second approach is based on the exact equations satisfied by the conditional probability density functions and conditional expectations (34B). The stochastic Ito calculus is required in this approach. I t can be shown that the optimal filter cannot be realized by a finitedimensional system. Finite-dimensional approximations have been employed to derive suboptimal nonlinear filters by Kushner (79B-ZZB), Bucy (6B), Jazwinski (76B), Bass et al. (ZB),and Denham and Pines (823). Because the second approach requires mathematics which is not commonly used by most chemical engineers, we will convert the nonlinear filtering problem into a deterministic one and employ the calculus of variations. This approach has the dual advantages of consistency and simplicity. The particular approximate nonlinear filters derived in 11.2.3 are based on a least squares estimation criterion and have been taken from Bellman et al. (3B) and Detchinendy and Sridhar (9B). Obviously other estimation criteria and approximations will lead to different filters, the performance of which will vary somewhat depending on the situation. A computational comparison of eight VOL 6 2

NO. 1 J A N U A R Y 1 9 7 0

35

different nonlinear filters has been presented by Schwartz and Stear (3UB). For background information on linear estimation and filtering the reader may consult the books , Deutsch ( 3 C ) . of Bryson and Ho (ZC), Lee ( K ) and 11.1 Estimation of Parameters from Discrete Noisy Measurements. The problem can be posed as follows. The system is described by the vector ordinary differential equation

x

=

f(t,x,k)

x(0)

(2.1)

xo

=

where x is the n-dimensional state vector and k is the p dimensional parameter vector (constant). The output of the system is the m-dimensional vector y . Measurements of the output are made a t R times, tl, t ~ ,, . ., t,. The measurements are related to the state according to

y,

+ (measurement errors)

h[t,,x(t,)]

=

r

=

The general solution of this linear ordinary differential fi equation is the sum of a particular solution and n homogeneous solutions

+

z(’)(t) =

I(y7

+ p“’(t)

W)(t)OL(1)

(2.9)

where @ ( I ) ( t ) is the solution of the matrix homogeneous equation (2.10)

1, 2 , . . ., R

W ( 0 ) = I

R

=

integrated from t = 0 to t = t,. This first guessed solution may be indicated by z(O)(t). Xext, Equation 2.6 is linearized to first order about zco)(t)

(2.2)

Given the observations y . we want to fit the model, Equation 2.1, to the data y., r = 1, 2, . . ., R, in some optimal way. No assumptions concerning the statistical nature of the measurement errors will be made. As usual, a least squares criterion will be used to fit the output of the model to the measured data. If we assume the initial state of the process xo is known, we desire to select the parameters k to minimize the least-squares criterion (if xo is unknown we will want to select both xo and k)

S

.An initial guess of the unknown parameters, k , = Z , + ~ ( O ) , i = 1, 2, . . , p , is made and Equation 2.6 is

- h[t,,x(&)1lj2Q7

(2.3)

r= 1

where Q. is an m X m positive semidefinite weighting matrix incorporating the relath-e accuracy of each component of the measurement vector at each time of measurement. At this point, let us adjoin to Equation 2.1 the fi equations

and p(’)(t)is the solution of

The most general problcm is when not only k but also xo is unknown. In this case, the initial conditions for Equation 2.11 and ~(”(0)= 0. Then, substituting t = 0 into Equation 2.9, we obtain

z(”(o) =

(2.12)

so that the a i ( i )represent the new estimates of x i o , i = 1, 2, . . ., n, and k,, i = 1> 2, . . . , p . The a i ( 1 )arc determined such that S is rniniinized

The inore coinillon problem is when x o is known and k is to be estimated. The initial conditions for Equation 2.11 are best chosen as

k = O and define the n

+ p vectors z and g by

=

[;I

g =

[;]

N o w Equations 2.1 and 2.4 can be written =

g

I n this case = . . . = a n ( 1 )= 0 so that, instead of Equation 2.10, it is only necessary to solve the p independent homogeneous vector solutions of

w 1, 2, . . ., n

Zi(0)

=

xi0

i

=

Zi(0)

=

?

2

= n+

1,

...,n + p

\Ve now desire to determine z i ( 0 ) , i = n to minimize

+ 1, . . .

4 2.1 . = 6 a.i. The total solution

=

C

- h[t,,

1 1 ~ 7

=

1, 2,

,

ti(t~)l~IzQ,

. ., n

A convenient technique for the minimization is quasilinearization ( 2 4 IC, 5C). In essence, we replace the nonlinear problem by a sequence of linear problems. 36

1,2, . . . , n + p

is then

=

p‘”(t)

+ c+ P

ai&

(2.16)

i = n f l

r=l

i

z(’)(t)

=

n

z(”(t)

K

S

j

INDUSTRIAL A N D E N G I N E E R I N G CHEMISTRY

From Equations 2.14 and 2.15 it can be readily seen that a r t + l ( l )., . ., an+p(ljrepresent the new estimates of k l , . . , , kT>. The ai(1)are determined to minimize S

bS bff 7

=

0

i

=

n + 1,

...,n + p

(2.17)

If h[t,x(t)] is a nonlinear function of x, Equations @ and @ 2.13 and 2.17 will lead, respectively, to nonlinear algebraic equations for the 016. If h[t,x(t)] = H ( t ) x ( t ) , H ( t ) a n m x n matrix, then linear algebraic equations are obtained for the ai. Once the as(') are obtained, these values are used as the next guess of the initial conditions and the algorithm is repeated until subsequent changes in the 01% are less than some prescribed criterion. Thus, the algorithm is used as follows:

+

1. Select a first guess zCo)(O) and integrate Equation 2.6 to t = tR. If xo is known, guesses need only be made for z,(O), i = n 1, . . ., 12 p . Denote the solution by z(O)(t). 2. Solve Equations 2.10 (or 2.15) and 2.11 to obtain r$(')(t) and p")(t). 3. Solve the algebraic equations from the minimization of S to determine 4. Generate z ( l ) ( t )from Equation 2.9 (or 2.16). 5. Return to step 2 with z ( ' ) ( t )in place of z @ ) ( t ) . ~ less than 6. Continue until /z@fl)(O) - Z ( ~ ) ( O )is some criterion.

+

+

Computational problems can arise in two places in the algorithm: Nonconvergence because of a poor initial guess, and difficulty in solution of the algebraic equations because of ill-conditioning. I t is difficult to determine if either of these points will be important in a particular problem before actual computation is carried out. A modification of the algorithm to improve the convergence characteristics for initial guesses has been proposed by Donnelly and Quon ( 7 7 A ) . Computations carried out to determine reaction rate constants from raw kinetic data have been presented by Bellman et al. ( 2 A ) , Lee (5C), and Seinfeld and Gavalas (25A). T h e algorithm can be used in kinetic studies for such questions as: (1) T h e effect of the level of experimental error on the parameter estimates; (2) The effect of the number and distribution of data points on the parameter estimates; and (3) T h e concentrations to measure, if a choice is available, for the best parameter estimates. Such studies can be carried out on the computer prior to actual experimentation, as a guide to experimental design. 11.2. Estimation of States and Parameters from Continuous Noisy Measurements. I t will now be assumed that the observations of the process are made continuously over the interval t o 5 t 5 T ,

y(t)

=

h[t,x(t)]

+ (measurement error)

(2.18)

The dynamics of the process can be noise-free or include noisy dynamical inputs, namely: No dynamical noise X

= f(t,x,k)

(2.19)

Additive dynamical noise X

= f(t,x,k)

+ (dynamical error)

(2.20)

T h e form of Equation 2.20 is somewhat restrictive, in

that additive errors can only be rationalized in two cases: Either the random inputs to the system truly appear additively on the right-hand side of the dynamical equations, or the random inputs are small enough so that a first-order linearization is valid, enabling one to lump all uncertainties in the term (dynamical error). There are two differences from the general problem of the previous section :

1. T h e output measurements are made continuously. 2. There exists the possibility of dynamical errors entering as forcing terms in the differential equation model of the system. Based on knowledge of y ( t ) in the interval to _< t 5 T , it is desired to estimate the state of the process a t some time tl. If:

5

to

tl

< T , the interpolating estimate

tl = T ti

the filtering estimate

>T

the prediction estimate

If we adjoin Equation 2.4 to Equation 2.19 or Equation 2.20, parameters can be considered as states, the new system dynamic models being z = g(t,z)

+ (dynamical error)

(2.21)

where z and g are defined in Equation 2.5, and the (dynamical error) term is zero, corresponding to Equation 2.19. As before, we will use a least square estimation criterion. Let any admissible estimate be denoted by E(t), t o 5 t _< T I , T I 2 T, which is continuous with piecewise continuous derivatives. With no dynamical noise the estimation criterion is

J[WI= Ilz(to) - mol12Po +

loj/r(O TI

- h(t,E)l/2Q(t)dt (2.22)

and with dynamical noise the criterion is

~ [ z ( t )=I Ijz(to>

Lo

- mol12Po+

Ti

flly

- h(t,z)/pq(i)3- 112

-

g(t>~)lI2rl&t (2.23)

where Z(to) is a n estimate of the initial state of the process and mo is the expected value of the initial state. If the initial state is known, say x(0) = XO, the first term on the right-hand side of Equations 2.22 and 2.23 would be dropped. T h e use of T I enables us to include the prediction problem. The optimal smoothing estimate G ( t ) minimizes J [ z ( t ) ] . R(t) is a symmetric, positivep ) X (n p ) matrix and POis an (. @> definite (. x (n p ) constant, symmetric, positive-definite matrix. Since y ( t ) is unknown for t > T , Q(t) = 0 for t > T in Equations 2.22 and 2.23. If we assume the estimation problem can be solved for T I = T , the solution of the prediction problem is

+

+

+

P

= g(t,z) VOL. 6 2

+

T 5

t

NO.

1

5 TI

(2.24)

JANUARY 1970

37

with the initial condition the optimal filtering estimate a t t = T . Thus, we will only consider the estimation ’ problem for T1 = T . 11.2.1 No DYNAMICAL NOISE. We want to find that trajectory ;(t), t o 5 t 5 T , which minimizes J [ z ( t ) ] given by Equation 2.22 with T = T I . This is a problem in the classical calculus of variations (4C). The necessary conditions are given by the Euler-Lagrange equations,

g(t,z>

(2.25)

2hZTQ[y(t) - h ( t , z ) ] - gZTX

(2.26)

=

i\

=

The boundary conditions for Equations 2.25 and 2.26 are obtained from the transversality conditions

x ( t o ) = 2Po[z(t0) - mol

(2.27)

X(T) = 0

(2.28)

Equations 2.25-2.28 represent the two-point boundary value problem, the solution of which yields the optimal smoothing estimate 2 ( t ) . These equations are nonlinear and must be solved numerically. The numerical solution of Equations 2.25-2.28 for a fixed value of T will yield the optimal smoothing estimate % ( t ) , which is the nonsequential solution. That is, we have gathered all the data y ( t ) , to4< t 5 T , before we produce the estimates of the states and parameters ~ ( t ) t ,o 5 t 5 T . These estimates are obtained from the solution of Equations 2.25-2.28. Another alternative is to ask for sequential estimates, i.e., to have 2 ( t ) continuously as y ( t ) is observed. This requires the solution of Equations 2.252.28 for all values of T 2 to. Let us first consider the nonsequential solution: We rewrite Equations 2.25 and 2.26 as

i = e(t,z,x)

(2.29)

= O(t,z,N

(2.30)

where O(t,z,h) and p(t,z,A) are the right-hand sides of Equations 2.25 and 2.26. A convenient method of solution is quasilinearization, in which, as we have already seen, the nonlinear two-point boundary value problem is replaced by a sequence of linear two-point boundary value problems. Denote the solution a t the kth iteration by

+

i@+i) = e [ t , Z ( k ) , ~ ( R ) ]e m [ t , z ( k ) , ~ (x k)] [,(k+1)

X(”1)

+

- Z ( ~ ) l ex[t,z(”,x(k)][x(k++” - ~

+

= p[t,Z(k),X(k)] p.[t,z(”,X(k)]

[z(’+l) -

~ ( ’ ” ’+ 1 PX[~,Z(”,X(”)I[X(”+’)

( k ) ]

x - X‘”]

(2.31)

X‘”’”(T)

=

0

@(‘+I)

&to)

= I

+

and is an arbitrary 2(n p ) dimensional vector of constants and p@+l)(t)is any particular solution of Equations 2.31 and 2.32. The a(’ +I) are determined to satisfy the boundary conditions, Equations 2.33 and 2.34. It is convenient to use as the initial condition for the particular solution p(IC+‘)(t), p(’+l)(to) = 0. Then, at t = to, Equation 2.37 becomes (2.37) where r is a vector containing the first n f p components p of and s is a vector containing the last n components of Equation 2.35 yields the n p relations

+ +

2P0(r - mo) = s

(2.33) (2.34)

T h e general solution of Equations 2.31 and 2.32 is

At t = T , Equations 2.34 and 2.35 become

+

The vector = (r,s)Tcontains 2(n p ) unknowns; n fl equations are given by Equation 2.38. T o obtain p equations, we use the last n p relathe other n tions of Equation 2.39. If we partition @@+l)(t,to) into ( n p ) X (n p ) submatrices

+

+

+

+

+

and p(Ic+I)(t)into 2(n

+ p ) vectors (2.41)

then the last n

+ p relationships of Equation 2.39 are

+ 422(T,to)s + P Z @ + ” ) ( T )(2.42)

0 = h(T,to)r

+

Equations 2.38 and 2.42 constitute 2(n p ) linear = (r,s)T. These equations for the components of equations can be written in the form

To determine (r,s) it is necessary to invert the matrix on the left-hand side of Equation 2.43. As in the case of parameter estimation from discrete data, one of the key problems lies in the accuracy with which the matrix inversion can be carried out in the solution of the linear algebraic equations for C Y . 11.2.2 ADDITIVEDYNAMICAL NOISE. We want to T , which minimizes find that trajectory ;(t), to 5 t J [ z ( t ) ] given by Equation 2.23 with T = T I for the


l

(2.60) (2.44)

where u(t) = z - g ( t , z ) . The Euler-Lagrange equations

Taking partial derivatives with respect to c on both sides of Equations 2.53-2.56 : (2.61)

yield Z

=

g ( t , z ) - 1/2R-’X

-

(2.46)

- gZTX

= 2hZTQ[y(t) h ( t , z ) ]

( t o , T ,c ) = 2Po

bC

(2.47)

(E)

The transversality conditions give the boundary conditions X ( t 0 1 = 2Po[z(O)

- mol

(2.48)

h(T) = 0

(2.49)

If we let the right-hand sides of Equations 2.46 and 2.47 be denoted by e(t,z,X) and P(t,z,X), then the algorithm of the previous section can be applied directly to detert T. mine the nonsequential solution s(t), to 11.2.3 SEQUENTIAL SOLUTIONS.Both the cases of no dynamical noise and additive dynamical noise yield a two-point boundary value problem described by Equations 2.27-2.30. T o produce sequential estimates it will be necessary to solve this boundary value problem for all values of T 2 to. Invariant imbedding (33B, 5C) represents a convenient method of doing so. In order to construct the solution, we imbed the original boundary value problem in a class of problems in which the terminal condition Equation 2.28 is replaced by the more general condition,

(2.63) (2.64)

Interchanging the order of differentiation in Equations 2.57, 2.58, 2.61, and 2.62, we see that the vector

” ” and each column of the matrix (E’=)

<
+ ex ($)

e, bT

Taking partial derivatives of the definition of r ( T,c)with respect to T yields

br(T,c) - - - dz(T,T,c) bT at

+

bz(T,T,c) bT

(2.69)

and similarly, taking partial derivatives with respect to c

br ( T , c ) bz( T ,T,c) -bC

ac

(2.70)

Combining Equations 2.29, 2.65, 2.69, and 2.70, we obtain

(2.57) (2.71) The initial condition for Equation 2.71 is obtained from V O L 6 2 NO,

1

JANUARY 1970

39

T

Equation 2.48 by setting to =

r(t0,c) = 1/2PO-k

+ mo

(2.72)

T h e optimal filtering estimate is z(T,T,O) = r(T,O), which is obtained from the solution of Equation 2.71 subject to Equation 2.72. Thus, the two-point boundary value problem has been converted into a n initial value problem involving partial differential equations. The problem remaining is the solution of Equation 2.71. If the system and observation are linear in z, then Equation 2.71 can be solved exactly. Let g(t,z) = G(t)z

+ r(d

(2.73)

h(t,z) = H ( t ) z

(2.74) 111.

Accuracy of State and Parameter Estimates in Ordinary Differential Equations

Then a solution of the form

r(T,c) =

P(7')C 2

+G(7)

(2.75)

~-

is assumed for Equation 2.71. Substituting Equation 2.75 into Equation 2.71 yields the equations (for dynamical and measurement error)

4(T)

=

G(T)z(T)

+ ~ ( 7 ' +) PH?'Q[y(T) - H( T ) z (T)] (2.76)

P(T) = G(T)P(T)

+ P(T)GT(T)- PHTQHP+ R-' (2.77)

which is the so-called Kalman filter. tions are

The initial condi-

P(0) = Po-'

(2.78)

S(0) = mo

(2.79)

Note that z ( T ) = r(T,0) is the optimal filtering estimate. Although it is not evident from this solution, P(T) is the covariance matrix of the estimate errors (this point is discussed in Part 11). We now return to the original nonlinear problem. The reader will recall that in the introductory remarks we mentioned that the solution of the nonlinear sequential estimation problem requires a linearization a t some point. We are only interested in solutions of Equation 2.71 around c = 0, so we assume a solution of the form of Equation 2.75, substitute into Equation 2.71, and retain only terms linear in c . Equating coefficients of like powers of c we obtain : no dynamical noise

k

= g(T,B)

P = g,P

+ P(Z')hETQ[y(T) -- h(T,P)]

(2.80)

+ P g Z T+ P {h,?'Q[y(T)- h ( ~ , ~ ) I ] J ' (2.81)

additive dynamical noise, same as Equation 2.80

P = g.g -t- Pgz'

systems. Similar estimation and filtering results can be obtained for discrete systems (2GB) and for continuous systems with discrete measurements (7B). In chemical engineering applications, nonlinear filtering has been applied in the feedback control of a continuous stirred tank reactor with random disturbances by Seinfeld et al. (32B). An approximate technique for the optimal stochastic control of nonlinear systems has been proposed by Seinfeld (37B), in which nonlinear filtering is used with a n instantaneous optimal controller. Gavalas and Seinfeld ( 7 7B) used nonlinear filtering to estimate states and kinetic parameters in a tubular reactor with catalyst decay.

The process of estimating states and parameters in nonlinear models is not complete without an analysis of the accuracy of these estimates, The analysis of the accuracy of parameter estimates in nonlinear algebraic models (Part I) is well known and will not be reiterated here (3C). Use of such analyses has already been described in Part I in connection with model discrimination and experimental design in chemical kinetic experiments (764 77A). In Part I11 we will discuss the analysis of the accuracy of parameter estimates in ordinary differential equations from discrete noisy measurements (11.1) and the analysis of the accuracy of state (and parameter) estimates from continuous noisy measurements (11.2). Let us first consider the analysis of the accuracy of parameter estimates in ordinary differential equations from discrete noisy measurements, corresponding to 11.1. The true value of the parameter vector will be denoted k* and the estimated value k. We will be interested in computing the covariance matrix of the estimate error, P = E [ ( &- k*)(k - k*)T]. Obviously, one objective will be to have the elements of P or combinations thereof as small as possible. The state and observations from the system are described by Equations 2.1 and 2.2, respectively. We will assume that the initial system state xo is known. Let us denote the experimental errors in Equation 2.2 by the rn-vector 7,. In order to analyze the accuracy of parameter estimates it is necessary to know some statistical characteristics of the measurement errors. Thus, we will assume that errors of individual data points are independent and normally distributed with zero mean and covariance matrix M,. & has been determined such that the least-square criterion, Equation 2.7, is minimized. If & is close to k*, we can write & = k* K. The deviation K from k* induces a deviation from the trajectory x * ( t ) corresponding to k*. It will be necessary to compute the sensitivity coefficient

+

+ P ( h z T Q [ y ( T-) h.(T,z)I ]J' + R-' (2.82)

T h e initial conditions are Equations 2.78 and 2.79. We have confined our attention to continuous dynamic 40

I N D U S T R I A L A N D E N G I N E E R I N G CHEMISTRY

By differentiating both sides of Equation 2.1 with respect to k, and then interchanging orders of differentiation

on the left-hand side, we obtain the parametric sensitivity equations D=AD+B

(3.2)

D(0) = 0 where A t , = ( d j @ x j ) k * and Bil = (dft/bkl)k*. Since k* is fixed, the choice of f; to minimize S is equivalent to choosing K to minimize S

i

= 1, 2,

...,p

It is well known that the choice of Q7 which produces the minimum variance estimate is M7-l. I n other words, the most accurate measurements receive the most influence and the least accurate the least influence. The minimization yields (7C) : R

HK=

D,TG,TM,-l,rl,

(3.4)

DTTGTTM,-lG,D7

(3.5)

r-1

where R

H

= r= 1

and Gij = (dh/k,),*. If H-l exists, then K is a linear function of the ,rl,, and E ( K )= 0. O u r objective is to calculate P = E(KKT).When we use Equation 3.4 and perform the expectation, R

HPH =

D,G,M,-'G,D,

(3.6)

r= 1

T h e right-hand side of Equation 3.6 is simply H.

p = H-1

Thus (3.7)

If H-l exists, parameter estimates can be obtained, the errors K of which are normally distributed with zero mean and covariance matrix P. I n fact, the existence of H-l is a test f the ability to estimate each of the p parameters from the m observations. If H-' does not exist, this is a n indication that too many parameters are unknown for the number of measurements. Having obtained P, all that remains is to investigate confidence intervals associated with individual elements of k or combinations of elements of k. This analysis is explained thoroughly by Rosenbrock and Storey (7C). If r(b) is the confidence interval of a n estimate & corresponding to a unit @-vectorb, then it will be true on a specified proportion of the time that k* will lie between the confidence limits bTk - r(b)

5 bTk* 5 bTk + r(b)

(3.10)

The variance ab2 = bTPb, and r(b) = vub, where v is a number determined by the level of confidence, e.g., a t 5% level, v = 1.96. I t can be shown that up is bounded by the square root of the minimum and maximum eigenvalues of P, X minllz 5 ub 5 X max112. Applications of this analysis for the evaluation of the accuracy of parameter estimates in chemical rate ex-

pressions have been presented by Heineken et al. (75A), Seinfeld and Gavalas (25A), and Ravimohan et al. (244). We now consider the accuracy of state (and parameter) estimates in ordinary differential equations from continuous noisy measurements, corresponding to 11.2. The only exact statements that can be made are in the case of linear state and observation, Equations 2.73 and 2.74, so we will discuss this case briefly. T h e exact solution of the estimation problem is given in this case by the Kalman filter, Equations 2.76 and 2.77. O u r derivation was carried out by treating the estimation problem as a deterministic one in the calculus of variations. T h e Kalman filter can also be derived on a statistical basis (77B, 78B), wherein it is assumed that the dynamical and measurement noises are independent random processes (Gaussian white noise) with zero means and covariance matrices R-' and Q-l, respectively. Also the initial state is normally distributed with mean mo and covariance PO-'. T h e objective is a minimum variance filter, Le., to minimize E( [ z ( T ) z(T)][z(T) z ( T ) ] * } by , choice of i ( T ) . T h e result is Equations 2.76 and 2.77, where P ( T ) is the covariance matrix of the estimate errors, the same matrix which we have discussed above, but now containing the state estimates. One area of practical importance is the analysis of the behavior of the Kalman filter when errors are made in setting model parameters or the noise covariance matrices, i.e., when the model and noise characteristics are chosen incorrectly and used in the filter equations. Sensitivity of the Kalman filter for large errors has been studied by Nishimura (27B) and Heffes (743) for discrete systems and Nishimura (2823) for continuous systems. Price (29B) has investigated the effect of errors in plant dynamics and noise covariances in the Kalman filter and has presented boundedness conditions for the covariance matrix. A recent comprehensive study of large- and small-scale sensitivity analysis of the Kalman filter has been presented by Griffin and Sage ( 73B). Kalman and Bucy derived the minimum variance filter when all measurements contain white noise, i.e., noise with correlation times much shorter than the response times of the dynamic system. Bryson and Frazier (4B) solved the corresponding smoothing problem. Bryson and Johansen (5B) derived the maximum likelihood filter for the case in which some measurements contain either no noise or colored noise, i.e., noise with correlation times comparable to the response times of the dynamic system, and Mehra and Bryson (24B) solved the corresponding smoothing problem. No exact statistical statements can be made in the case of the nonlinear filtering equations, Equations 2.80 and 2.81 or Equations 2.80 and 2.82. As we noted, the statistical formulation of the nonlinear filtering problem by the stochastic Ito calculus leads a t present to a n insoluble problem. Nevertheless, P( T ) in Equations 2.81 and 2.82 can be viewed as a n approximation to the true, but unknown, covariance matrix of the estimate VOL. 6 2 NO.

1

JANUARY 1970

41

errors. The initial condition, Equation 2.78, can be chosen to reflect the level of certainty of knowledge of the true initial value. I n the absence of detailed a priori statistical information this choice of initial conditions is to a large extent arbitrary. This presents a difficulty, since the time required for a satisfactory tracking of the system depends on the initial conditions. By a n asymptotic analysis for small times, Gavalas et al. (72%) have expressed the initial optimal estimates in terms of the initial values of the observations and their first derivatives. NOMENCLATURE

A b

= n X n matrix in Equation 3.2

B

= = = =

#-dimensional vector n X p matrix in Equation 3.2 n p-dimensional vector C n X p matrix of sensitivity coefficients in Equation 3.1 D m-dimensional vector of residuals e, f = n-dimensional vector function =n p-dimensional vector function g G = (n + p ) x (n + P I matrix = Hamiltonian function defined in Equation 2.44 H = p X p matrix defined by Equation 3.5 H = Hessian matrix a t k(i) Hi I = identity matrix k = p-dimensional parameter vector = expected mean of the initial state mo M ( k ) = m X m moment matrix of residuals = n +$-dimensional particular solution P = (n p ) x ( n p ) covariance matrix p = m X m weighting matrix Q = n p-dimensional vector defined in Equation 2.37 r =

+

+

+ +

+

r(T,c) = z ( T , T , c j = number of data points R = ( n + p ) X ( n + p ) weighting matrix R = matrix defined in Equation 1.10 Ri = n +p-dimensional vector defined in Equation 2.37 S

S t,T wi X

y 7 .

objective function time variables = p-dimensional vector defined in Equation 1.11 = n-dimensional state vector = m-dimensional observation vector = n p-dimensional state vector = =

+

Greek symbols 01

p Y 1)

6 K

x

Y Pt

(Tb

%

+

1~.

+

2(n $)-dimensional vector of constants ( n +$)-dimensional vector function in Equation 2.30 = confidence interval = observation error = (n p)-dimensional vector function in Equation 2.29 = p-dimensional vector = (n p)-dimensional adjoint vector = constant in confidence interval = parameters in Equation 1.10 = standard deviation of parameter estimate = 2(n p ) X 2(n p ) fundamental matrix = ( n + p ) x ( n + p ) matrix = criterion function = constraint function = =

+

+

+

+

Jvo,

(4A) Blakemore, J. W., and H o d , A. E., Chem. En!. Progr. Symp. series 42, 5 9 , 14 (1963). (54.) Booth, G. W . and Peterson, T. 1.; Computer Program: WL NLI-Nonlinear Estimation, IBM: New York, 1959. (6A) Box, G. E. P., and Lucas, H. L., Biometrikn, 4 6 , 77 (1959). (7A) Box, G . E. P., Ann. N . Y.Acad. Sci., 86, 792 (1960). @A) Broydon, C. G., M a t h . Comp., 21, 368 (1967). (9A) Cull, N. L., and Brenner, H. H., IND.END.CHEM., 5 3 , 833 (1961). (10A) Davidon, W. C., “Variable Metric Method for Minimization,” A,E,C, Research and Development Report .4NL-J990, 1959. (11A) Donnelly, J. K., and Quon, D., A.1.Ch.E. Tampa Meeting, Paper 19f, M~~ 19-22 (1968). (12A) Fletcher, R., and Powell, M. J. D., T h e Computer J . , 6 , 163 (1963). (13A) Grrenstadt, J., Math. Comp., 21, 360 (1967). (14A) Hartley, H. O., Technonielrzci,3 , 269 (1961). , (15A) Heineken, F. G.: Tsuchiya, H.M., and Ariq, R., Maih. B i o ~ c i ~ n c1 ~, ~115 (1967). (1GA) Kittrell, J. R . , Hunter, W. G., and Watson, C. C., A.I.Ch.E.J., 11, 6 , 1051 (1965); 12, 1, 5 (1966). (17A) Kittrell, J. R., Mexalai, R., and Watson, C. C., IND. ENG.cHEM., 57, 12, 18 (1965). (18A) Lapidus, L., and Peterson, T. I., A.I.Ch.E.J., 11, 5, 691 (1965). (19A) Marquardt,D.W., J.Siarn, 1 1 , 2 , 431 (1963). (20A) hfezaki, R., and Kittrell, J. R., IKD.E m . Cnmi., 5 9 , 5, 63 (1967). (21.4) Peterson, T. I., Chem. Eng. Ptogr. Symp. Series N o . 31, 5 6 , I 1 1 (1960). (22A) Peterson, T. I., and Lapidus, L., Chem. Eng. Sci., 21, 655 (1966). (23A) Peterson, T. I., ibid., 17, 203 (1962). (24A) Ravimohan, A . L., Chen, U’. H., and Seinfeld, J. H., Can. J . Ch.E., submitted for puhiicarion. ( 2 5 ~ )Seinfeld, J. H., and Gavalas, G . R., 6lst Annual Meeting A.I.Ch.E., preprint 24a, Los iingeles, Dec. 1968.

B. Linear a n d Nonlinear Filtering on ~~~~~~~j~ (1B) Arhans, M., U’ishner, R. p., and Bertolini, A , , IEEE Trans. Control, AC-13, 504 (1968). (2B) Bass, R . W., Norum, V. D., and Schwartz, L., J . M a i h . Anai)ricnndApfi[,, 16, 152 (1966). ~ (3B) Bellman, R . E., Kagiwada, H. H., Kalaba, R . E., and Sridhar, R., J . ~ naui. Sci., 13, 3, 110 (1966). (4B) Bryson, A . E., and Frazier, hi., “Smoothing for Linear and Nonlinear systems,” Proc. of the Optimum Systems Synthesis Conf., T D R A S D - T R D - ~ 19, ~-~ Wright Parteraon AFB, Ohio, p. 354, Feb. 1963. ~ B Bryson J Jr A . E., and Johanscn, D . E., IEEE T m n i . on nuiomatic ~ ~ ~ (- AC-10, 4 i19i;). (6B) Bucy, R . S., ibid., AC-10, 1 9 8 (1965). (7B) Cor, H., ibid., AC-9, 5 (1964). (8B) Denhan, W . F., and Pines, S.,A I A A Jour., 4 , 6 , 1071 (1966). (7B) Detchmendy, D. M., and Sridhar, R . , J . Baric Eng.. 88, 362 (1966). (10B) Friedland, B., and Bernstein, I., J . 27r‘ranklzn Inst., 281, 6, 4 j j (1966). (11B) Gavaias, G . R . , and Seinfeld, J. H., Chem. Enp. Sci.,2 4 , 625 (1969). (12B) Gavalas, G. R., SeinfeId, J. H., and Sridhar, R., J . Basic Eng., submitted for publicarion. (13B) Griffin, R . E.:and Sag?, A . P., IEEE Trans. on Automafic Control, AC-13, 320 (1968). (14B) Heffes, J., ibid., AC-11, 5 4 1 (1966). (l5B) Ho, Y .C., and Lee, R . C. K . , ;bad., AC-9, 333 (1964). (16B) Jazwinski, A. H., ibtd., AC-11, 765 (1966). (17B) Kalman, R . E., J.B n s i c Eng., 82, 35 (1960). (18B) Kalman, R . E., and B u q , R. S . , J.Boric En!., 83, 95 (1961). (19B) Kushner, H. J . , J . Dif.Equaizonr, 3 , 179 (1967). (20B) Kushner, H. J., J . S I A M Control, 2 , 106 (1964). (21B) Kushner, €I. J., IEEE Trans. on Aulomnfic Control, AC-12, 262 (1967). (22B) Kushner, H . J., Joint Automatic Control Conf., Philadelphia, Penn., p. 613, June 1967. (23B) Larsen, R. E., Dressler, R . h i . , and Ratner, R. S . , ibid., p. 634, June 1967. (24B) ,Mehra, R . K., and Bryson, J r . , A . E., Joint Automatic Control Conf., Session 23, Paper h, Ann Arbor, p. 871 (1968). (25B) ;Mowery, V. O., IEEE T r a n s . on Automatic Control, AC-9, 399 (1965). (26B) Neal, S . R., ibid., AC-13, 705 (1968). (27B) Nishimura, T . , itid., AC-12, 123 (1967). (28B) Nishimura, T., ibid., p 268. (29B) Price, C. F., ibid., AC-13, 699 (1968). (30B) Schwartc, L., and Stear, E. B., ibid., p 83. (31B) Seinfeld, J. H., A.I.Ch.E.J., in press. (32B) Seinfeld, J. H . , Gavalas, G. R., and Hwang, M., IND.ENG.CHSM.FUNDAM., 8,257 (1969). (33B) Sridhar, R., Second Joint L’.S.-Japan Seminar on Applied Stochastics, Washington, D.C., Sept. 1968. (34B) Wonham, W. M.: J . S I A M Conlroi, 2 , 3, 347 (1965).

Superscripts A

*

= estimated value =

true value

REFER ENCES A.

Nonlinear Parameter Eatimation (1A) Bard, Y. “Comparison of Gradient Methods for the Solution of Nonlinear Parameter istimation Problems,” IBM New York Scientific Center Report No. 320-2955,. Seut. . 1968. (2A) Bellman, R. E., Jacquez, J., Kalaba, R., and Schwimmer, S . , Math. Btosciences, 1 , 71 (1967). (3A) Bellman R. E., Kagiwada H. H. and Kalaba R . E., “Quasilinearization System Ideritification and Pred?tction,)”TheRAND borp., Memo RM-3812-PR, Aus. 1963.

42

INDUSTRIAL AND ENGINEERING CHEMISTRY

C. Books (1C) Bellman, R. E., and Kalaba, R. E., “Quasilinearization and Nonlinear Boundary Value Problems,” American Elsevier, New York, 1965. (2C) Bryson, Jr., A. E., and Ho, Y. C., “Applied Optimal Control,” Riaisdell, Waltham, Mass., 1969. (3C) Deutsch, R., “Estimation Theory,” Prentice-Hall, Englewood Cliffs, N.J., 1965. (4C) Gelfand, I. M., and Fomin? S . V., “Calculus of Variations,” Prentice-Hall, Englewood Cliffs, N.J., 1963. (5C) Lee, E. S., “ Quasilinearization and Invariant Imbedding,” Academic Press, New York, 1968, (6C) Lee, R. C. K., “Optimal Estimation, Indentificarion, and Control,” M.I.T. Press, Cambridge, Mass., 1964. (7‘2) R o s e n e c k , H. H., and Storey, C.,“ComputationalTechniques for Chemical Engineers, Pergamon Press, New York, 1966.

l

l

~

~

~