systems is an area Tofhe increasing .control. of interest .chemicaltoreaction chemical engineers. This is self-evident from an examination of the number of publications which have appeared in the technical and research journals over the past few years. This interest is motivated by the theoretical development itself and by the prospective area of on-line digital control systems. In the present paper the significance of a single body of control theory, namely, the control of linear systems subject to quadratic performance indices is shown. Owing mainly to the work of Kalman, this is the one area in which control theory has developed to a remarkable point and for which calculations for high dimensional problems may be carried out in a simple and straightforward manner. To illustrate the ideas and concepts of this theory the Hamilton-Jacobi formulation is used since it has not appeared in the chemical engineering literature. I n addition, it will be shown that stability via the Liapunov function is an inherent part of the formulation and solution, and the filtering problem, which is the dual of the control problem, may be placed within the same theory. Also, the theory forms the basis for an iterative solution of nonlinear control problems. While the necessary proofs are not presented here, adequate references are given to the interested reader.
Unification of the stability and filtering problems associated with control of processes is possible, if we understand the significance of the unifying theory. The author discusses that significance and illustrates it with Hamilton-Jacobi formulations.
LEON LAPIDUS
CONTROL PROBLEM DEFINITION It is important that we define the control problem under investigation in rather precise terms. Thus we assume that we are given the linear dynamical system: X(t) = A&)
+ Bu(t)*;
X(S)
xo
(1)
y(t) = Hdt)
(2)
where x(t) is a real n-vector called the state, u(t) is a real r-vector called the control, and y ( t ) is a real rn-vector, m 5 n, called the output. A, B, and H are constant or time-varying matrices with H relating the output to the state. The inequality rn < n implies that H is not square and that fewer elements can be actually measured than exist in the real dynamical system. If m = n and H = I all of the states are measurable and are, in fact, the outputs. T o define a measure of the control we use the scalar performance index
2 I(%to, 1,;
u)
=
X[dt,)l
+ J;’J[x(t), u(t)ldt
(3)
where
h[x(t,)l = x(t,), W t , )
= (x(t,),
SXO,))
+
Ru) 9
(4)
and
JMO, 28
401
(4 Qd
(
~
INDUSTRIAL A N D E N G I N E E R I N G CHEMISTRY
(5)
The functionality of I shows that the initial state, xo, the initial time, to, the final time, t,, and the control, u, all determine its numerical value. In these equations is the dot or inner product of x(t,) with itself but weighted by the matrix S at time f,; S, Q, and R are suitably selected weighting matrices with the constraints that: S and Q are symmetric positive-semidefinite (S, Q 2 0)
R is symmetric positivedefinite (R > 0 )
(6)
There are some simple physical interpretations to Equations 3-5, namely:
X [x(f,)] = a weighted measure of the final state which the system achieves at time t,
J:
(x, Qx)df = a weighted integrated measure of the overall path of the system from time t o tot,
J: (u, Ru)dt
= a weighted integrated measure of
the overall amount of control used from time t o tot, Since we shall try to minimize I (see below) and S, Q, and R are open for selection, subject only to the constraints of Equation 6, we, in general, will try to make
.
each of these terms as small as possible. This will become clearer in the discussion to follow [also, see (74, 75)
I.
We define the optimal control law as ~ * ( t =) - K x * ( t )
(7)
where K is the feedback matrix (either constant or timevarying), u * ( t ) is the optimal control for which I assumes its minimum value and x * ( t ) is the corresponding optimal trajectory. Note that, given an initial condition x o and a specification of u * ( t ) , Equation 1 generates a unique x * ( t ) . The minimum value of I is defined by min
I*(xo, t o , t,) =
lJ
I(X0, t o , t,;
u)
(8)
since I.I = u* is assumed to be determined by the minimization operation on the right-hand side of the equation. O n this basis we may define the control problem of interest here as, “Find the control u(t) which drives (xo, t o ) to (0, t,) (or y = 0 , t,) subject to the system Equations 1 and 2 ; that u ( t ) from among all admissible candidates which yields I* is the optimal u*(t).” Note that we always have the strict inequality I > I* (9 1 For the major part of our discussion, we shall choose the matrices A , B, H,Q, R, and S to be constant. This is not necessary but it simplifies the formulation. I n addition, we generally will take t o = 0 for convenience. I t is also important to point out that the control problem does not specify any control or state constraints; we shall discuss later an indirect way to handle such constraints. Returning briefly to the performance index, Equations 3-5, we wish to point out why the restriction on R exists, as given by Equation 6. If we need to form a scalar function (and we shall have to do this later to satisfy Pontryagin’s minimum principle), then : 2 H = (x, Q x )
+ (u,Ru) + 2
(2,
Ax
+ Bu)
(10)
where z will be defined shortly then H can have a minimum only if R 0 ; if R > 0 , then H has a unique minimum for all x , z , and t (the regular case) ; if R = 0, then H will have a minimum if and only if z’B = 0 in which case H is independent of u. As a result, we select the case R > 0 which has the feature of removing the possibility of singular type controls (7, 26). Further, we note that if we interpret (u, Ru) as a “cost of control” then the case R = 0 implies that there is no penalty or cost associated with the optimal control. In such a situation, u * ( t ) may well turn out to be an impulse function which is obviously undesirable from an implementation point of view. As a final point here, one may ask why mixed terms
>
in x and u such as 2 ( x , Nu) were not included in the integrated portion of Equation 5 . The reason for the omission is that a simple transformation (see later) can convert the more complicated form to that of Equation 5. O n this basis we tend to use the simpler formulation.
CONTROL PROBLEM SOLUTION The optimal solution to the problem posed above may be solved in many ways; by dynamic programming (5, 7 4 , by the maximum principle ( I d ) , by assuming the a priori solution (77), and even by an elaborate use of ordinary calculus (75). However, we shall use the Hamilton- Jacobi theory here since it is basically new to the chemical engineering literature. The derivations will lack the rigor of formal proofs but these can be found in Kalnian’s work (8, 9 ) . We shall define a Hamiltonian function as required by the maximum principle (24) and show that in this way the optimal control problem can be reduced to the solution of the Hamilton-Jacobi (H-J) partial differential equation. The existence of a solution of the H-J equation is then a sufficient condition for the optimal solution and with certain conditions on the smoothness of I*, this is also a necessary condition. For the initial part of this discussion let us assume that m = n and let H = I in Equation 2. Then we define a Hamiltonian function : 2 H ( x , z , t; u)
=
J ( x , u)
+ 2 (z, A x + Bu)
(11)
where z is a real n-vector called the adjoint, costate, or conjugate vector. Further, we define min
H*(x, z, t) =
u H ( x , z , t; u)
(12)
with u not in H * since it is assumed known from the minimization ; this is Pontryagin’s minimum principle. We have assumed here that zo = 1.0 so that only a normal problem is under consideration (6, 12). According to the explicit form of Pontryagin’s principle the necessary conditions for u * ( t ) to be optimal are that the canonical equations x = H, 2 = -H, (13 ) where H , is the gradient vector of H with respect to z, are satisfied and that the minimum condition
H ( x * , z * , t; u*) 5 H ( x , z , t; u) (14) hold for any u ( t ) . I n addition, z must satisfy certain boundary conditions at t p But from our definition of H*, Equation 12, it follows that the free canonical equations have the form x = H,* z = H,* (1 5) VOL. 5 9
NO. 4
APRIL 1967
29
This immediately implies the feedback law u*(t) =
-R-lB’z
(20)
In the same sense we observe that
>0
H.. = R
(21)
and thus we do get a true minimum at u* of Equation 20. This incidentally points out why we need to specify R > 0; in addition, R > 0 assures us that the problem is regular with no singularities. But from the first canonical equation of Equation 13 it also follows that
x
Ax
=
+ Bu*
and substituting Equation 20,
x = Ax - BR-lB’z (22) T o summarize, we see that we can now write down the free canonical equations and the optimal control in the form
a -
r] [“ . -
Z
-Q
-A’ -BR-’
B’][z]
(23)
with The terminology “free” is used here because the explicit dependence on u ( t ) has been removed by the minimization of Equation 12 or 14. It is this extension of the normal calculus of variations that the maximum principle is concerned with. Under suitable smoothness conditions all optimal trajectories can also be determined by solving the H-J partial differential equation
+ H*(x, I=*, t )
It*
=
0
(16)
subject to
I*(% t,) = t,) (17) In our case, the smoothness conditions are that I* be twice-differentiable, of Class 0 [although, see (28)]. The existence of a function satisfying Equations 16 and 17 is a sufficient condition for the existence of an optimal trajectory. As a further major point we can see that if we differentiate the H-J equation with respect to x and compare with the second of the free canonical Equations 15 that z = I.*. This is a link between the two theories and, in fact, it is possible to show that the canonical equations are the characteristic equations of the H-J partial differential equation. The question which immediately comes to mind is how does all of this apply to the particular control problem at hand. T o answer this question we now turn to more specific details. Since the Hamiltonian for our problem is given by Equation 11 with Equation 5, it follows that, from the second Equation 13,
i
=
-H= = -Qx
- A’z
(18)
But to satisfy the minimum condition, Equation 12 or 14, it is necessary that
H. = 0 or Ru* C B ’ z = 0 30
(19)
I N D U S T R I A L A N D E N G I N E E R I N G CHEMISTRY
-R-’B’z(t)
(24)
(x,Qx) 2 (2, AX) - (B’z, R-IB’z)
(25)
u*(t) =
Also 2 H*
+
At this point we note that if H # I in Equation 2, then all of the above equations hold except that S and Q are replaced with (H, S H ) and (H, QH). The only problem remaining is that z ( t ) occurs in all of the equations above. T o remove this dependency we will solve the H-J equation. This may be carried out by recalling that z(t) =
I,*
and assume that the solution of Equation 16 is given by 2 I * ( X , t, t,) = (x, Px)
(26)
where P is an unknown (at this point) symmetric matrix. But from the definition of I* it follows that 2 P(x,t,r t d = b(t,), Sx(t,)) and thus
P(t,) =
s
(27)
Further, from z = I,* and Equation 26 it follows that (a Riccati transformation) z(t) =
P(t)x(t)
(28)
and since Equation 26 is assumed to be a solution of Equation 16, we may substitute the former into the latter. After some algebraic manipulations we find that for the result to hold for any x it is necessary that
P
+ A’P 4- PA - PBR-’B’P + Q = 0
(29)
is satisfied. This is the matrix Riccati equation, which while nonlinear, can be solved by some elementary operations (see later). Note that Equation 29 is subject
to the final condition of Equation 27, and does not depend on xo or to. While we shall not show it here, P(t) is symmetric and positive-definite. But using Equation 28 in Equation 24, we now see that the optimal control is given by u*(t) = - K ( t ) x ( t )
(30)
when K(t), the optimal feedback gain, is
.
K(t) = R-'B'P(t)
(31)
Note that the optimal control law is linear, closed-loop but with time-varying feedback. If we now insert this optimal control into the system Equation 1, X ( t ) = [A whose solution is merely
x(t)
-
= (exp[(A -
BK]x(t)
BK)(Z
- to)I)x~
(32) (33)
Equation 33 can be written as x(t) = @*(t, to)xo
(34)
where @*(t, to) = exp { [A - BK](t - t o ) ) is the transition matrix having the properties that: *(to, to) = I *-I@, to) = * ( t o ,
t)
(35)
canonical Equations 23 and will require some simple matrix operations. Thus we write the solution of Equation 23 in the form either
and
6(t, to)
= (A -
BK) @(t,t o )
In summary to this point we see that the optimal control is given by Equation 30 where K(f) is determined by solving the Riccati equation with a known condition at t = tf; the resulting optimal trajectory is given by Equation 33 or 34 which requires only a matrix exponential routine for evaluation.
The only problem remaining is to illustrate a simple way to evaluate K(t) or J(t) via the Riccati equation. This procedure will involve the transition matrix of the
or
where eC(t, to) is the transition matrix for the coefficient matrix of Equation 23. But we may write to) or *c(t,, t ) in the form of submatrices as
Substituting Equation 37 into Equation 36 and using z(tr) = P(tl) x(t,) we get, after some manipulation z(t) = [*B
- P(t/)*~z]-'[P(ty)*~i - *ml(t,.t)x(t)
(38)
assuming that the inverse exists. But from Equation 28:
P(t) =
[e22
-
P(t/)@l2l-'[P(t/)*ll -
*21l(y,d
(39)
By using a matrix exponential routine to calculate aC(tf, t) and an inversion routine along with the given P(t,), we can calculate P(t) at any t [and thus K(t) from Equation 391. No differential equations need to be integrated although, of course, the integration can be carried out ifdesired. While we have these equations at hand we note that if P(t) converges to a limit as t , becomes large (t, -+ m ) then this is equivalent to letting P(t) = 0 in Equation 29. This also infers that P(t)t,-s is independent of P(tJ and thus if we take P(tl) = 0, Equation 39 becomes
PO) = - * z z ( t / ,
t)-l
*fl(t,,
VOL. 5 9
1);
NO. 4
t/
-
m
APRIL 1967
(40) 31
CONTROL SOLUTION FOR
r, =
A considerable simplification can be effected if we let t , = m . In such a case we must be sure that an optimal solution actually exists and that I* is finite. T o assure ourselves of this point we need to introduce the concept of complete controllability. The mathematical theorem (2, 8, 9) is that for an invariant system (A, B are constant), a necessary and sufficientcondition for complete controllability is that rank of [B, AB,
. . ., A"-'B]
=n
(41)
In more general terms, this means that any initial state
can be taken to the origin in a finite length of time by some suitable control function. O n this basin we may state that if A, B are constant, the system is completely controllable, and S = 0 then the performance index is always finite and there exists a solution to the optimization problem. Further, the limit lim
.
t-c
m
P(t) = P,
(42)
is unique and P, is an equilibrium state of the Riccati equation, P(t) = 0. When S # 0 this is no longer necessarily true since the steady-state Riccati equation is nonliiear and, depending on the initial condition used, the limiting solution as t -+ m may lead to different roots or equilibrium states. This implies that when we discuss the case oft, = m we should always use S = 0; but this merely acknowledges that in an infinite time of control the final state is x(t,) = 0. Incidentally this is perfectly feasible for linear systems but for nonlinear systems with multiple steady states and possibly limit cydea such an ahnwledgment may be inappropriate. When f , = m , the optimal control is given by u*(t) = -R-'B'P,x(t)
(43)
= -K,x(t)
TABLE 1. SUMMARY OF EQUATION FOR CONTROL OF LINEAR MODEL W I T H QUADRATIC PERFORMANCE INDEX
Figure 1. Feedback Iyrrcnz for optimal control of a linear Jrsrnn w'th qwdrotic performance index
In other words, the optimal control uses a constant feedback gain matrix obtained by, say Equation 40, or by integrating the Riccati equation backward in time. The optimal transition matrix is also constant. Note that when t , = m the solution of the normally coupled canonical equations appear as a decoupled boundary value problem. The constant feedback gain matrix is determined independently of x(t) (a backward time pass) and then the optimal control or the resulting state is evaluated using the constant matrix K, (a forward pass). Figure 1 shows the feedback loops associated with the optimal control system and Table I summarizes the equations of interest. As has been mentioned, this theoretical development is based upon a strict neglect of any control or state constraints. While this may seem a serious restriction, a brief consideration reveals that a suitable choice of the weighting matrices Q and R can indirectly satisfy constraints. Thus if an optimal trajectory is obtained for which one element of the control vector is above a wnstraint, it is only necessary to repeat the optimal control calculation but with the corresponding element in R increased. This will penalize that particular control element and cause the control to drop lower. Obviously, the same idea can be used for any number of the control variables, and state constraints can be compensated for using Q, In this manner (74, 75) it is possible to impme an indirect form of control and state constraints. There are two extensions of the above development which are important. These extensions involve first the use of time-varying matrices and second the introduction of sampliig to yield the discrete form of the theory. While the discussion above has assumed that 4, B, H, Q, R, and S were all constant the extension to ome or all time-varying matrices can be carried out. i l he theory becomes more difficult and the proofs become more complex; but the extension, especially in a wmputational sense, is straightforward. The introduction of sampling merely converts all the continuous equations to
'ir
AUTHOR Leon Lapidus is Professm of Chernual Enginening at Princeton Univmity.
.
-
their discrete analogs-Le., the system equations become difference equations and the performance index becomes a sum. The theory itself follows through in basically the same way with slight variations-Le., in the sampled case R = 0 can be a feasible choice for the control weighting matrix.
STABILITY OF OPTIMAL CONTROL
-
Since the optimal system (with t, = m ) is given by Equation 32 with K = K, the stability of the optimal trajectory is determined by the eigenvalues of [A BK,], If these are all negative then the system is stable; this is true even if A has some positive eigenvalues (even if the original uncontrolled system is unstable) since the subtraction of BK, changes the situation. I t follows in a general sense, if any of the optimal eigenvalues are negative, some of the states will not go to the origin. But this contradicts our hypothesis that x(t, = m ) = 0 and thus one would infer that the optimal system is stable. To put this on a firmer theoretical basis we turn to the notion of a Liapunov function (70, 27). Note first that a simple computational means for evaluating stability would be to take the transition matrix @(t,t o ) of the system (Equation 33) and raise the norm of this matrix to high powers. If this norm approaches zero, then the optimal system will be stable. Liapunov's method for determining stability resides in specifying that a scalar function V(x) exists which obeys the following conditions along all trajectories of a dynamic system: (a) V(X) > 0, x # 0
d
< 0,
(c)
V(X) V(x)
=
(d)
V(x)
-t m
(b)
x# 0
V(x) = 0, x as ]lxll -t
=
0
m
(44)
Statements (a) and (d) say that Y(x) = constant generates a series of closed surfaces surrounding x = 0; ( b ) says that all trajectories of the dynamical system enter these surfaces heading for x = 0 and that this happens for every closed surface. A system which has these properties is globally asymptotically stable. If some of the restrictions of Equations 44 are removed, the statement about the nature and the region of stability must be relaxed accordingly (70). We see immediately (subject to complete controllability) that I*(x, t, m ) is a Liapunov function. Thus
2 I*(x, t,
m)
=
Jm
[(x, Qx)
satisfies (a), (c), and (d) above. 2 I*(x, t,
a)
=
V(X) = (x, Px)
+ (x, PX)
(47) (48)
Now we insert the system equation into Equation 48 to give
P(x) = (x, (A'P
+ PA)x) + 2 (x, PBu)
(49)
Note at this point that if we wish to make v(x) as negative as possible [see ( b ) above and also see Equation 50 below] we should select u(t) as related to the inverse sign of B'Px. Note the comparison to Equation 43. Now if we set V(x) = 2 i * ( x , t, m ) (from Equations 45 and 49), some algebraic manipulations show that the steady-state form of the Riccati equation must be satisfied-i.e., P = P, leads to a Liapunov function. For a further discussion of this point see the simple example in a later section of this paper. Actually this same point can be brought out for the present case by a different approach to Liapunov's theory-i.e., itn approach which uses the linear and constant forms of the various equations. This approach resides in the theorem (70) : "Given a continuous time, free, linear stationary dynamic system of the form X(t) = Wx(t)
the equilibrium state of this system, x(tl) = 0 is asymptotically stable if and only if, given a symmetric positivedefinite matrix Q, there exists a symmetric positivedefinite matrix P which is the unique solution of
W'P
+ PW = - Q
(50)
In addition,
I*(x) =
'/2
(x, Px)
(51)
is a Liapunov function for the dynamic system.'' To illustrate how this theorem can be applied to the optimal control trajectory, we note first from Equation 32 that in our notation
- BK] [A - BR-lB'P]
W = [A
(52)
Next we note that from Equation 25 and the fact that z = I,* that
Further,
R-'K,x)]
V(X) = (x, Px) where P is a constant symmetric matrix such that
=
+ (u*,Ru*)Idt
- [(x, Qx) + (*,
this immediately tells us that the optimal system is asymptotically stable [see Kishi (73) for a means of implementing this idea]. T o illustrate this point in a different fashion let us select a Liapunov function as
(45)
But since K, is positive-definite, K,'R-lK, is also positive-definite and I* < 0. Thus ( b ) above is also satisfied and V(x) = 2 I*(x, t, m ) (46) Note that this infers that if the solution of the Riccati equation converges (in our current context it does)
H* =
'/2
(x, Qx)
+ {Ix*,Ax) l/z
QX*, BR-'B'I,*)
(53)
But if we observe that
(Ix*,AX) - ' / 2 (Ix*,BR-'B'I,*) (Ix*,X)
=
+
'/2
(Ix*, BR-'B'I,*)
then Equation 53 can be written as VOL. 5 9
NO. 4
APRIL 1967
33
H* = '/z [(x, Qx)
+ '/z(Ix*,BR-lB'I,*)
SIMPLE EXAMPLE
4(Z*, x> (54)
Now if we substitute this last equation into the H-J equation, and note that
I,*
=
To illustrate in a simple manner some of the features of the control problems we have been discussing, let us consider the scalar problem with system equations
Px
there results
I* = -c$(x)
a
= Q
'/2
(x,
2 I(x,
Q4
+ PBR-IB'P
(55)
From the properties of Q, B, and R and if P is positivedefinite then 55 is also positive-definite. Substituting Equations 52 and 55 into Equation 50 there results
A'P
At)
W)
=
(56)
a
IMPLEMENTATION OF CONTROL
It should be apparent that, when x is of dimension 5-10 or greater, the actual application of these formulas to obtain the optimal control and trajectory will require a considerable computer programming effort. Fortunately, this is no longer the case because of the outstanding work of Kalman and Englar (12). They have already written a general purpose program suitable for the IEM 7090/94 which solves all problems of the type we are considering with a minimum of effort to the user. Subject only to the limitation n < 16 they have put together a series of subprograms which carry out all the desired optimal calculations when the user simply punches u p a few input cards. I n other words, a single card as input calls the Riccati equation solver and generates either P ( t ) or P,; all operations are carried out by simple macroinstructions by the user. An important feature of these subprograms is that each is monitored in the most sophisticated manner to minimize computational errors. The writer cannot overemphasize the importance of a program of this type which puts the optimal control of linear systems with quadratic performance indices on a computational equivalent with, say, a subroutine for calculating the exponential of a scalar number or for integrating ordinary differential equations. The solution to such control problems is now available to all, even those with a minimal mathematical and control background. Further areas of use of this program will be developed shortly in this paper but even in the present context the importance of having such a program available should be obvious. INDUSTRIAL A N D ENGINEERING CHEMISTRY
u)
t o , t,;
=
+ sb' (qy2 + ru2)dt
s~(t,)~
(57)
The coefficients in these equations are constant scalars with the connection to our previous formulations being A
+ PA - PBR-'B'P + Q = 0
which is simply the steady-state form of the Riccati Equation 29. From the properties of P in the Riccati equation, we thus are assured that is positive-definite and that I* is, in fact, a Liapunov function. I n other words, we have proved the necessary and sufficient conditions for the optimal trajectory to be asymptotically stable [since I*(x) is positive-definite for all x, the trajectory is really globally asymptotically stable].
34
ax(t)
and the performance index
where 44x1 =
+ bu(t)
i(t) =
=
Q = h1 2 0
[a]
B = [b]>O
R = [Y]>O
H
s
=
[h]
= [SI10
Obviously we must look into the features of the Riccati Equation 29 to infer certain features of the optimal control. Keeping in mind that we are using y ( t ) = hx(t) the Riccati equation is
I, + up
+ pa - pa + hqh = 0 r
or
T o start our discussion let us consider tl = m . I n this case 6 = 0 and Equation 58 becomes aquadratic algebraic equation in p . Obviously there are two roots to this quadratic equation. These are given by
b [ a - + -71
(59)
qh2b2
p2 =
.
I
4 0 2
+
+
and to simplify let a: = d a 2 (qh2b2)/r. Since a a > 0 while a - a: < 0 and we are looking for pmwhich is nonnegative the steady-state solution of the Riccati equation which is of interest ispm = pl. We note that this value is obtained independent of the numerical value of s. This point can be illustrated more fully by a phase plot of us. p ; p1 is a root in the right-hand plane and p2 is a root in the left-hand plane. Since there are no further roots, the phase plane plot through p l and p z must be parabolic. Using Equations 59 we may write Equation 58 as (GO)
I n fact, if we choose somep > pl or 0 < p 5 PI,Equation 61 shows that p is decreasing or increasing, respectively, during backward integration. Thus, any initial condi-
tion p(t,) = s 2 0 will lead to p l as the stable root although the explicit choice of s will lead to different t l ) ~ beforepl is reached. Proceeding further, we see that the feedback gain constant and the optimal system response are given by: k, = bP -1 r
But (a
- bk,)
h2q r
0 the original system Equation 56 is unstable, whereas if a < 0 it is stable. Thus the optimally controlled system is stable irrespective of the stability of the original dynamic system. An obvious Liapunov function is given by V(x) =
I
(a), (c),
plxk
=
pl(a - bk,)x2
p(t)
and the relationship between the message and the observed variable
where w(t) and v(t) are random noise vectors. Given the observed signal z(7) over the period of time t o T 5 t , (starting a t t o and going to t,), we desire to find the best estimate S(t,) of x(t,) a t any time t,. Alternately one could ask to find a linear operator on the Z(T) at time t , to yield the best estimate. “Best” estimate will here be characterized as minimizing the square of the difference [x(tz) - 2(tr)l2 = [2(t,)l2; in other words, the best estimate will be obtained in terms of some minimization of G(t,). The assumption is made that w(t) and v(t) are independent random processes (Gaussian white noise) with 0 means and covariance matrices
cov [v(t), v(7)l = COV
[W(t), W(T)]
(Et) 6 ( t - 7)
= 0
for all t, 7
(69)
cov [W(t)] = cov [v(t)] = 0 where and E are symmetric, positive-definite matrices, and 6(t - 7) is the Dirac delta (impulse) function. Now we ask to find that estimate on x(t,) called %(t,) such that
I n the limit it is seen that OD
(67)
0, x2 > 0 and (a - bk,) < 0, (Vx) < 0. Thus, condition (b) of Equation 44 is also satisfied, V ( x ) is a Liapunov function and the optimal trajectory is asymptotically stable. As a final point of interest withpl = p , we see from Equation 61 that as q / r increases (r decreases for fixed q ) , k , increases. Thus the feedback gain increases as less penalty is paid for control effort. I n the limit as r -+ 0, k, -+ a3 , and the response is an impulse function. Turning briefly to the case where t, is finite, we observe that Equation 58 must be solved subject top(t,) = s. In the present case this can actually be carried out analytically. For ease of presentation, we show below the solution for s = 0.
tj+
x ( t ) = A(t)x(t)
(64)
1/2plx2
which immediately satisfies the conditions (d) of Equation 44. Further,
V(x)
previous material but, as we shall point out, really bears an intimate relationship. The problem concerns the case where the state variables in x ( t ) are only partially measurable. I n other words, the state x(t) cannot be directly measured experimentally but rather we can measure some alternate linear combination of the states, y ( t ) . Because of measurement noise, we really observe only an estimate of y ( t ) . The question immediately arises as to whether it is possible to reconstruct the x(t) from these estimated values of y ( t ) . It should be relatively obvious that if such a reconstruction can be carried out then the dual features of filtering (estimation) plus control can be placed into an adaptive-type feedback loop. The basic work which we shall briefly quote here results from Kalman and Bucy ( 7 7 ) , but the interested reader should consult the work of Pearson (23), Schultz (25), Kishi (73),and Arimoto (7). In more specific terms we assume that we have a random process described by the linear model
r b2
= - (a
+ a) =
$1
(66)
as given by Equation 59.
OPTIMAL FILTERING I n this section we turn to a problem which a t first glance seems to have only a superficial bearing on the
cov [ S ( t r ) ] = cov [ x ( t , ) ]
(70)
cov [ { Z ( t , ) , %(tl))] = minimum
(71)
and
I n Equation 71 we see that the square of jZ(t,) is minimized where %(tl)is a measure of the error in the estimation of x ( t , ) . I n the normal sense then, %(t,) will be a n unbiased minimum variance estimate of x ( t f ) . VOL. 5 9
NO. 4
APRIL 1967
35
T h e results of the theoretical analysis of this problem may be summarized by the following three Equations 72-74 : (a) There exists a relationship between x(t) and the observations z ( t ) in terms of a feedback system
k(t)
+ K ( t )[ ~ ( t-) H ( t ) % ( t ) ]
A(t)%(t)
=
(72)
The initial condition OF this equation %(to) is known. ( b ) The gain matrix K ( t ) is given by
K(t) = P(t)H’(t)E-l(t) with
P(t)
=
(73)
cov [k(t), % ( t ) J
(c) Further, P ( t ) satisfies the variance (or Riccati) equation
+
P(t) = A(t)’P(t) P(t)A ~ ( t ) ~ ( t ) R - i ( t ) ~ ( t ) ’ ~ (Q(t) t)
P(td = cov
with
+
[f(to),
Wo)I
(74)
NONLINEAR CONTROL PROBLEM All of the previous information and results are possible because the system model was linear and the minimization used a quadratic criterion for optimality. The next immediate question would be to ask if this special problem has any applicability to the nonlinear system which the chemical engineer may encounter. We shall now show that the answer to this question is yes and that a n iterative algorithm can be evolved which uses the linear quadratic information. Let us first repeat the information we originally obtained; namely, that for Equations 1 and 3
(75)
It is apparent from these equations and those given previously for the optimal control problem that there exists a dual relationship between the two problemsLe., the Riccati equation appears in the solution of both problems. This duality is shown in Table 11, and one may then say, “The solutions of the optimal estimation (filtering) problem and of the optimal control problem are equivalent under the duality relations of Table 11.” We see that the one fundamental difference is in the time variable; this is, of course, completely expected since in the control problem we seek information ahead of the present time, whereas in the filtering problem we seek present information based upon measurements made in the past. EQUIVALENCE I N FILTERING AND CONTROL PROBLEMS Filtering Control
TABLE II.
A
A’
B
”
H P
B’
x ( t ) = Ax(t)
2 I(x, Q, ti; U)
+ Bu(t)
2I =
=
soi
(76) or (1)
[(x, Qx)
(u, Ru)]dt
+
X = 0
(77) or (3)
the optimal control, trajectory, and canonical equations are given by Equations 20, 22, and 23
u*
-R-IB’z
= X
= [A
[i] [ =
= -R--’B’Px
- BR-IB’PIX -BR-lB’] -A’
-Q A
(78) or (20) (79) or (22)
[z]
(80) or (23)
X has been chosen as equal to 0 and thus P satisfies the Riccati equation with P(t,) = 0. This choice of h = 0 merely makes the rest of the present discussion easier to handle; the case h # 0 can be included if desired. Now we ask what the equivalent results would have been if instead of Equations 76 and 77 we had started with
x ( t ) = Ax(t)
P
+ Bu(t)
(81)
Q
-Q R
R
t0
tf
t
-t
Of further importance is the fact that a completely controllable control system is the dual of a completely observable filtering system. As a result, all that was said regarding the control and stability in the previous sections of this paper carries over directly to the filtering problem under the rules of Table 11. O n this basis the Kalman Automatic Synthesis computer program is directly applicable to the filtering problem and the results can be obtained with a minimum of difficulty. We have here emphasized the positive features associated with the filtering problem. Our main endeavor was to connect the filtering problem with the control problem. While we shall now consider the nonlinear 36
control problem it does not seem worthwhile here to do the same for the filter problem. In the latter case not only nonlinearities but non-Gaussian white noise would be involved. For a detailed discussion of this problem the reader is referred to the work of Sorenson (27).
INDUSTRIAL A N D ENGINEERING CHEMISTRY
Equation 82 can also be written as (merely expand the matrices) 2I
=
l ” [ ( x , Qx}
+ 2 (x, Nu) + (u, Ru)] dt
(83)
and we see that Equations 77 and 83 differ by the mixed quadratic term 2 (x, Nu). If we carry out the same analysis on Equations 81 and 82-83 as we have previously done for Equations 76 and 78 there results: U* =
--R-’[N’x
+ B’z]
[A - BR-’N’ - BR-’B’P]x A - BR-1” - BR-lB’ -& + NR-1” -A‘ + NR-IB’]
(84)
X =
[I][
I t is obvious that when N
=
(85)
[E]
(86)
0, this second set of equa-
-
tions degenerates to the first. By comparing the two sets of equations, we see that if we choose: A = A - BR-lN’ (87) Q = Q - NR-’N’ (88) the two systems-i.e., Equations 76-80 are exactly equivalent to Equations 81-86. Thus the solution of either system can be obtained by the solution of the other. Alternatively this equivalence can be shown by a simple change of variable. Thus if we let
- R-”’x
= U
(89)
in Equations 81 and 83 the result is:
[X -
X(t) =
BR-’N’]x(t)
(Q - NR-lN’)x)
2 I = Jt’[(x,
+ BU(t)
+ (ii, Ru)]dt
tf = fixed
those of second order, there results
+ R x f ) 6 x+ Hu6u]dt + L”[(ax, Rxx6x)+ 2 (ax, RxuSu)+ (6u,Ruu~u>l dt (97)
sr,
= Jt’ [(Z’
The first term on the right-hand side of Equation 97 is the linear part of the Taylor series and the second term is the quadratic term of the series. The terminology is as before, namely, B, = bR/bx evaluated along U and S, etc. All the B,, H,,, ETxu, and are time-varying matrices. Evoking the canonical equations (the necessary equations for a n optimal trajectory) :
(91)
61, =
ktf
((6x9 B&6x)
h
+
I
IA
=I
+ Ltf[(z, (f - x))]dt
r,
=
Ltf
[I$
+J
- (.,
x)ldt
+ 2 (6x9 RxU6u) f (98)
TABLE Ill. EQUIVALENCE BETWEEN LINEAR AND NONLINEAR CONTROL PROBLEMS Linear Nonlinear
-
A
i,
B
h
6
E-x ,
-
N
R
Hxu Hull
x
6X
Z
62
I
612
Note that by the various expansions we have converted Equations 90-91 to Equations 92 and 98; these latter two equations look familiar and, in fact, if we use the equivalence shown in Table 111, we see that we have Equations 81 and 83 or, with the change in variables of Equation 89, Equations 76 and 77. Thus we may immediately say that if U and 2 were the optimal trajectory, the optimal control and the canonical equations would be given by (see Equations 84 and 86) :
- Ruu-’(Bxuf6x+ ru’6z] (99) zx - fuRuu-’Rxuf - tuBuu-’iuf -(ax -~xu~uu-’~xu’) - ex- ZUBUU -‘Rxu‘
6u” =
[E]
=
[
Ix
(94)
is the augmented index. This may be rewritten slightly by defining the Hamiltonian function
H = {z, f) so that Equation 94 becomes
nu= 0
(su,Ruu, W ) d t
6u=u-G
+
and
Equation 96 becomes
Both f(x, u) and J ( x , u) are general functions of x and u. Let us now linearize the system model around any given trajectory j ; (any nominal trajectory corresponding to ii and Z)--i.e., 6x=x-2 converts Equation 90 with Taylor series up to only linear terms into 6X = fax Fu6u (92) I n this equation f, = (df’/bx)’, fu = (dP/bu)’ with the overbar indicating that the derivative is evaluated along U, L Equation 92 has the same form and features as Equation 76 except that fx and FU are time-varying matrices since they vary along the entire trajectory. Thus, Equation 92 could be written as ,6 = X(t)6x B(t)6u (93) Next consider the variational problem of minimizing I subject to the system Equations 90 and a fixed control time t p T h e standard procedure for solving this problem (2, 75) is to form an augmented index in which the system equation is attached to the original performance index by Lagrange multipliers or adjoint variables. Thus
aUu
nu,
Z = -R,
Equations 87 and 88 follow directly. With this preliminary information in hand we turn to the nonlinear control problem which can be stated as given X(t) = f(x, u); x(0) = xo (90) to find that u(t)which minimizes the performance index:
I = J”J(x, u)dt
If we expand Equation 96 in a Taylor series around the nominal trajectory, B and 2, but retain all terms through
(95) (96)
where C, D, and E represent the large matrices in Equation 100. Unfortunately, at the start of the iteration, ii and iare not the optimal conditions and thus R,, # 0 ; as a result, Equations 99-101 represent the conditions which we will ultimately have to achieve. At this point these conditions are not yet satisfied. VOL. 5 9
NO. 4 A P R I L 1 9 6 7
37
Before detailing the solution to this problem we wish to mention one other point. Since we have carried out Taylor Series expansions to second order, the secondorder terms must be positive for the optimum to be at least a local minimum. This means that we should look at the matrix of, say, Equation 84 to see if it is positive-semidefinite. For the original linear quadratic control problem,
4-
N - Q O [N‘
[o
R]
%ea.
and from the assumed properties on Q and R this is positive-semidefinite. I n the present case, a brief look a t Equation 100 shows that if
Buu= positive-definite
(102)
Bxx - ~ x u J ? u u - l ~ x=u ’positive-semidefinite the solution to our problem will be at least a local minimum. # 0), we can form T o solve the new problem (with a new Hamiltonian, augment the index with the linearized system equation and 62, and use the fact that the partial derivative of the new Hamiltonian with respect to 6u should be 0. Note that this is an application of the same procedure used above to get the first set of modified equations. I n the present context we are solving the so-called accessory minimum problem. This set of steps yields (we shall not show the details of the derivation) :
+
+ Bu)
6u* = -Buu-’(RXu’6~ fU“6z
(103)
Note that if g,, = 0 in these equations we achieve the necessary conditions of Equations 100-101. T o solve the equations in 6x and 6z, we may invoke the method of solution used originally for the linear quadratic problem, namely, let 6z = P(t)6x
+ q(t)
(105)
This is equivalent to the Riccati transformation used in Equation 28, except that q(t) has been used to take care of the terms in in Equation 104. If we substitute Equation 105 into Equations 103 and 104 we get:
a,,
6u”=
-ITuu-’{[Bxu’f fU;’P(t)]6x 4-iu‘q + Ru)(106)
and the fact that q(t) and P ( t ) must satisfy q(t)
P(t)
+ [ P ( t ) D ( t )+ c’(t)Iq(t) + P ( t ) v ( t ) - ~ ( t ) 0, =
q(tr)
=
0 (107)
P(ty)
=
0
+ C’(t)P(t>+ P(OC(t) + P(t)D(t)P(t) + E ( t ) = 0,
(a) Guess ii(t), integrate the system Equation 90 starting at x(0) to yield % ( t ) (6) Integrate z = -Rx backward starting with z ( t r ) = 0. Along this backward pass also evaluate R x x , Rxu,Ruu, R l u - l , and H u (c) At the same time as (6) is carried out, integrate q ( t ) and P ( t ) backward (Equations 107 and 108) to yield p ( t ) and P ( t ) (d) But if we write Equation 106 in terms of:
(108)
I N D U S T R I A L A N D E N G I N E E R I N G CHEMISTRY
=
-guu-’{
[Rxu’
+
fu’P(t)l(xnew
- &Id)
8”’q we can go back to
(a)
+
+ Ru]
using une,
( e ) The entire procedure is repeated until Ru = 0 within some error bound. At this point we have satisfied the necessary conditions for the optimal control, and the answer is available As with all algorithms there are certain problems associated with the present one; in particular, RUu must be positive-definite and conjugate points may occur (the Riccati equation becomes unstable). This latter point is related to the controllability of the system. If G ( t ) is chosen close enough to u* these problems will probably not occur. For a further discussion on these points, see Mitter (ZZ), Lee (76), Breakwell et al. ( 4 ) , Breakwell and Ho ( 3 ) ,Merriam (79--27), and Luus and Lapidus (18). The point to be made is that the solution outlined follows along the lines of the previous linear quadratic case and that many parts of the ASP program can be used directly. Thus our linear quadratic solution has direct application to the nonlinear control problem. LITERATURE CITED (1) Arimoto, S., Inform. Control 9, 79 (1966). (2) Athans, M., Falb, P. L., “Optimal Control,” McGraw-Hill, New York, 1966. (3) Breakwell, J. V., Ho, Y., Intern. J . Eng. Sci. I, 565 (1965). (4) Breakwell, J. V., Speyer, J. L., Bryson, A. E., J . SIAM Control 1, 193 (1963). (5) Dreyfus, S. E., “D namic Programming and the Calculus of Variations,” Academic Press, New qork 1965. (6) Gel’fand, I. M., Fomin, S. V., “Calculus of Variations,” Prentice Hall! New York (1963). (7) Johnson, C. D., Chapter in “Advances in Control, 2,” Academic Press, New York, 1965. (8) Kalman, R. E., Bol. Soc. Matematica Mexicann, p. 102 (1960). (7) Kalman, R. E., “The Theory of Optimal Control and the Calculus of Variations ” Chap. 16, “Mathematical Optimization Techniques,” R. Bellman, ed., Uni;. of California Press, Berkeley, 1963. (10) Kalman, R. E., Bertram, J. E., J. Baric Eng. 82, 380 (1960). (11) Kalman, R. E., Bucy, R. S., I6id., 83, 95 (1961). (12) Kalman, R. E., Englar, T. S., “A User’s Manual for 4 S P C,” RIAS Rept., Baltimore, Md., 1906. (13) Kishi, F. H., Chapter in “Advances in Control Systems, 1,” Academic Press, New York, 1964. (14) Lapidus, L., Chern. Ens. g m p . Ser. 61 (55), 88 (1965). (15) Lapidus, L., Luus, R., “Optimal Control of Engineering Systems,” Blaisdell, 1967. (16) Lee, 1.: Inform. Control 8, 589 (1965). (17) Luenberger, D. G., IEEE Trans. Auto. Control 10, 202 (1965). (18) LUUS,R., Lapidus, L., “The Control ofNonlinear Systems. 11. Convergence by Combined First and Second Variations,” A.I.Ch.E. J. (1966-67). (19) Merriam, C. W‘.,Inform. Control 8, 215 (1969). (20) Merriam, C. W., J . SZAM Control 2, 1 (1964). (21) Merr$m, C. W., “Optimization Theory and the Design of Feedback Control Systems, McGraw-Hill, New York, 1964. (22) Miner, S. K., Automnlica 3, 135 (1966). (23) Pearson J. D., “On the Duality Between Estimation and Control,” to be published (n J. SIAM Control, in press. (24) Ponrryagin, L. S., et ai., “The Mathematical Theory of Optimal Processes,” Inrerscience, New York, 1962. (25) Schultz, P. R., Chapter in ”Advances in Control Systems, 1,” Academic Press, New York, 1964. (26) Siebanthal, C. D., Aris, R., Chem. Ens. Sci. 19, 729 (1964). (27) Sorenson, H. W., Chapter in “Advances in Control Systems, 3,” Academic Press, New York, 1966. (28) Tchamran, A , , J. Franh. Znrt. 280, 493 (1965).
where v(t) = -fuRuU-lRu and w ( t ) = -RxuiTuu-lBu. Note that as expected, Equation 108 is a form of the Riccati equation which we previously encountered. On this basis we are led to the following algorithm (called the second variation method) for solving our original nonlinear control problem: 38
- Uold