ARTICLE pubs.acs.org/IECR
The Cascade Optimization Algorithm: A New Distributed Approach for the Stochastic Optimization of Engineering Applications Antonis C. Kokossis,*,†,‡ Patrick Linke,†,§ and Siyu Yang† †
Center for Process & Information Systems Engineering, University of Surrey, Guildford, GU2 7XH, United Kingdom School of Chemical Engineering, National Technical University of Athens, GR-15780, Athens, Greece § Department of Chemical Engineering, Texas A&M University at Qatar, PO Box 23874, Doha, Qatar ‡
ABSTRACT: This paper introduces a new stochastic optimization approach in the form of a cascade optimization algorithm. The algorithm incorporates concepts from Markov processes while eliminating the inherent sequential nature that is a major obstacle preventing the exploitation of advances in distributed computing infrastructures. This method introduces partitions and pools to store intermediate solutions and corresponding objectives. A Markov process increases the population of partitions and pools. The population is distributed periodically, following an external certainty. With the use of partitions and pools, multiple Markov processes can be launched simultaneously for different partitions and pools. The cascade optimization algorithm holds a potential in two different fronts. One aims at its deployment in parallel and distributed computing environments. Through storage of solutions in the pools, the algorithm further offers cost-effective means to analyze intermediate solutions, visualize progress, and integrate optimization with data and/or knowledge management techniques without additional burden to the process.
’ INTRODUCTION Stochastic optimization algorithms have found many applications in complex engineering problem solving15 and in a wide range of problems in chemical engineering.614 The advantages and disadvantages of stochastic methods have been reviewed by many authors;1519 still, the most widely applied algorithms include Simulated Annealing (SA),20 Tabu Search (TS),21 Genetic Algorithms (GA),22 and Ant Colony Optimization (ACO).23 These methods follow a statistically random probabilistic-driven search. With the randomness in the search, stochastic methods can fully explore the search space and exclude the locally optimal solutions. However, the randomness also causes inefficiencies in the search and the algorithms tend to require long computational times to converge. The use of conceptual methods to address the problem complexity offers solutions to specific problems but is difficult to generalize.24,25 Many contributions have been made to the scientific literature that attempt to address these shortcomings by either parallelizing or distributing search efforts or by improving the search strategies of the individual stochastic algorithms. The former route appears particularly promising, in view of the opportunities presented through the emergence of advanced computing infrastructures such as grids, cyber infrastructures, or clouds through which vast computational resources can be made available in networked environments. The latter route relies on better capture and management of knowledge to guide the search. With a few exceptions,26,27 current algorithms do not record and actively manage the intermediate system information and only enable a passive exploitation of knowledge through search meta-heuristics. Parallel versions of SA,28 TS,29 GA,30 and ACO31 have all been proposed. The application of such parallel algorithms has been demonstrated to improve the overall time for optimization studies. However, because of the synchrony and sequential nature of the search strategies utilized by these algorithms, the r 2011 American Chemical Society
efficiency with which large computational resources can be exploited is limited. Thus, the opportunity to speed up the convergence of existing stochastic algorithms by making more computational resources available is limited. This is particularly true for advanced computing infrastructures, where parallel and distributed computing applications are no longer limited to the same location with a localized cluster of computers, but are instead being applied on the grid networks across the Internet. Most existing parallel algorithms follow a synchronous mode in which the faster workers must wait for the slower ones. This parallelization suffers from high communication traffic among computational resources and benefits in terms of speeding up convergence are limited. The development of a fully distributed optimization algorithm in which the computational resources run simultaneously, following an asynchronous mode, is still a largely unresolved challenge. However, the development of such an algorithm is required to enable effective exploitation of large distributed computational resources. This paper introduces a new stochastic optimization algorithm that records and actively manages intermediate system information. The algorithm enables asynchronous operation and can capitalize on knowledge acquisition techniques. We refer to the new algorithm as “the cascade optimization algorithm (COPT)”. This paper focuses on introducing the theoretical concepts of the algorithm. A detailed description of the implementation and the significance of the parameters of the algorithm, as well as applications to engineering problems, will be the subject of Special Issue: Puigjaner Issue Received: July 8, 2010 Accepted: March 21, 2011 Revised: March 18, 2011 Published: March 21, 2011 5266
dx.doi.org/10.1021/ie1014603 | Ind. Eng. Chem. Res. 2011, 50, 5266–5278
Industrial & Engineering Chemistry Research
ARTICLE
future publications. A fully distributed stochastic optimization algorithm is proposed by incorporating the concepts from the Markov process of SA. It introduces conceptual partitions and pools to store intermediate solutions and corresponding objective values. The partitions and pools grow as the Markov process generates new intermediate solutions. Populations within partitions and pools are distributed periodically, according to an external criterion. With the use of partitions and pools, multiple Markov processes can be executed simultaneously without interactions. Thus, the new algorithm is suitable for distributed computing techniques. The general formulation of optimization problems takes the following form: min f ðx; yÞ 8 > < hðx; yÞ ¼ 0 s:t: gðx; yÞ e 0 > : x ∈ X R n ; y ∈ Y f0; 1gn
Table 1. Available Points in S x1
point
ð1Þ
37.4
14.6
52
B
20.2
41.8
62
C
1.7
34.3
36
D
16.4
18.6
35
E
12.7
3.3
16
F
35.7
39.3
75
G
0.4
3.6
4
H I
1.7 52.4
21.3 7.6
23 60
Illustration 1: Let the optimization problem be
’ BASIC CONCEPTS AND DEFINITIONS Proofs of the lemmata introduced in this section are presented in the Appendix. Let S be the feasible region of (1): S ¼ fðx, yÞjhðx, yÞ ¼ 0 ∧ gðx, yÞ e 0, x ∈ X, y ∈ Y g ð2Þ and D is the domain of (1):
min f ¼ x1 2 þ x2 2 þ y 8 > < x1 þ x2 ¼ y s:t: x1 x2 y e 0 > : x1 ; x2 ; y > 0 with the feasible region S ¼ fðx1 , x2 , yÞjx1 þ x2 ¼ y ∧ x1 x2 y e 0, x1 , x2 , y > 0g Let A, B, C, D, E, F, G, H, and I be the available points in S; Table 1 presents these points. Let the superset of partitions G = {G1,G2,G3}⊆ S with G1 ¼ fA, B, Cg G2 ¼ fD, E, Fg G3 ¼ fI, H, Gg
ð3Þ
Let G be the superset of disjoint partitions in S, so that G ∪ Gj ⊆ S
ð4Þ
j
According to f, G1, G2, and G3 are mapped into P1, P2, and P3: f
G1 f P1 ¼ f1665:8; 2218:4; 1215:1g
with Gj being ordered sets of finite elements in S: Gj ¼ fs1, j , s2, j , :::, sMj , j g
si, j ∈ S
f
G2 f P2 ¼ f649:8; 189:2; 2894:1g
i ¼ 1, 2, :::, Mj ,
j ¼ 1, 2, :::, np
f
G3 f P3 ¼ f2863:9; 479:5; 17:1g
ð5Þ
where np is the number of partitions/pools. Let F be a superset of numerical functions. For each h∈F including the objective function f of (1), si,j∈S can be mapped so that h
si, j f hðsi, j Þ hi, j ∈ R
ð6Þ
Based on Definitions 1 and 2, P1 and P2 are pools and P3 is an ordered pool. Lemma A: Gj is a σ-algebra of G and Pj is a σ-algebra of D. Lemma B: The objects (G,Gj) and (D,Pj) are measurable spaces. Definition 3: Qj is the population of Gj iff Qj is the rank of Pj(Gj):
Defined over Gj, f can be used to map Gj to R Mj , so that f
fj : Gj f f ðGj Þ ¼ f ðsi, j Þ
Mj i¼1
M fi, j i ¼j 1
y
A
where f(x,y) is the objective function, and h(x,y) and g(x,y) are the constraints.
D ¼ ff ðx, yÞj " ðx, yÞ ∈ Sg
x2
Qj ¼ rankðGj Þ ¼ rankðPj Þ ð7Þ
and M f 0 fj : Gj f f 0 ðGj Þ ¼ f ðsi, j Þ i ¼j 1 ∧ f ðsi, j Þ g f ðsk, j Þ " i < k ð8Þ
Definition 1: A set Pj is a pool of G if $Gj⊂G so that Pj is the domain of (7). Definition 2: A set P0j is an ordered pool of G if $Gj⊂ G so that P0j is the domain of (8).
ð9Þ
Let Pj be as defined in Definition 1. Also let an h-metric hj defined over Pj as a mapping from R M f R , so that h M hj : Pj f hðPj Þ ¼ h ffi, j gi ¼j 1 ð10Þ Lemma C: The function hsm ðPj Þ ¼
Mj
∑ jf ðsi, jÞj i¼1
ð11Þ
is a measure of the object (D,Pj). 5267
dx.doi.org/10.1021/ie1014603 |Ind. Eng. Chem. Res. 2011, 50, 5266–5278
Industrial & Engineering Chemistry Research
ARTICLE 0
Let h∈F and a set of np pools Pj with their hj metrics. Let Dnph be an ordered combination, so that
Definition 6: Let0 Anp0 and Anp be the domains of Dnph and Dnph, respectively. Dnph is defined as an inflection of Dnph iff
Dhnp ¼ ðP1 , P2 , :::, Pnp Þ
Anp Anp
0
ð12Þ
Illustration 4: The partitions in Illustration 1 are
with " j>k
hj > hk
ð13Þ
Definition 4: A combination of np pools Pj is called a cascade if $h∈F, so that Dnph follows (13). Illustration 2: Let us apply hj on P1, P2, and P3 of Illustration 1, so that h1 ¼ hsm ðP1 Þ ¼ f ðAÞ þ f ðBÞ þ f ðCÞ ¼ 5099:3 h2 ¼ hsm ðP2 Þ ¼ f ðDÞ þ f ðEÞ þ f ðFÞ ¼ 3733:2 h3 ¼ hsm ðP3 Þ ¼ f ðGÞ þ f ðHÞ þ f ðIÞ ¼ 3360:5
ð17Þ
G ¼ fG1 , G2 , G3 g with G1 ¼ fA, B, Cg
G2 ¼ fD, E, Fg
G3 ¼ fI, H, Gg
Let the partitions G0 be 0
0
0
G0 ¼ fG1 , G2 , G3 g with 0
0
G1 ¼ fA, B, Dg
Based on Definition 4, the combination
0
G2 ¼ fC, E, Ig G3 ¼ fF, H, Gg
Using f, the cascades are Dh3 ¼ fP1 , P2 , P3 g
Dh3 ¼ fP1 , P2 , P3 g is a cascade, following Definition 5: The cascade domain Anp of Dnph is defined as np M ð14Þ Anp ¼ ∪ fi, j i ¼j 1 j¼1
with
Lemma D: B(Anp,fi,j) is a σ-algebra of Anp and
and
Hj ¼
h ðPj Þ
∑ hsm ðPj Þ j¼1
H2 ¼ Hðhsm ðP2 ÞÞ ¼
H3 ¼ Hðhsm ðP3 ÞÞ ¼
∑ hsm ðPjÞ j¼1 h ðP2 Þ sm
3
∑ hsm ðPjÞ
j¼1 sm
h ðP3 Þ
3
∑h j¼1
sm
¼
0
P3 ¼ ff ðFÞ, f ðHÞ, f ðGÞg 0
0
5099:3 ¼ 0:418 12193
Dh3 is an inflection of Dh3. Expanding Partitions and Pools. Let a Markov process gtL : S f S be defined over a sequence of time periods t, where L is the length of a Markov process. Also let Gj, t ¼ ðsi, j, 1 , si, j, 2 , :::, si, j, t Þ so that
¼
¼
ðPj Þ
3733:2 ¼ 0:306 12193
gtL ðsi, j, k Þ ¼ ðsi, j, t þ 1 , si, j, t þ 2 , :::, si, j, t þ L Þ ∈ S k ¼ 1, 2, ... , t
lim dk ¼ dz
ð18Þ
and Gj,tþL is given by
3360:5 ¼ 0:276 12193
Gj, t þ L ¼ Gj, t ∪ gtL ðsi, j, k Þ
Let Dnph,k be a sequence of cascades and Anp,k be the domains of Dnph,k. Let dk be the density probability function of Anp,k. Definition 5: The dz is a “profile” of Dnph,k iff kf¥
0
P2 ¼ ff ðCÞ, f ðEÞ, f ðIÞg
A3 ¼ ff ðAÞ, f ðBÞ, f ðCÞ, f ðDÞ, f ðEÞ, f ðFÞ, f ðGÞ, f ðHÞ, f ðIÞg ¼ A3
Based on (15) and Illustration 2, the probability measures of A3 are h ðP1 Þ
0
Since
A3 ¼ f2218:4, 1665:8, 1215:1, 2894:1, 649:8, 189:2, 2863:9, 479:5, 17:1g
3
P1 ¼ ff ðAÞ, f ðBÞ, f ðDÞg
ð15Þ
np
is a probability measure of Anp. Illustration 3: The domain of Dh3 in Illustration 2 is
H1 ¼ Hðhsm ðP1 ÞÞ ¼
0
P1 ¼ ff ðAÞ, f ðBÞ, f ðCÞg P2 ¼ ff ðDÞ, f ðEÞ, f ðFÞg P3 ¼ ff ðIÞ, f ðHÞ, f ðGÞg 0
sm
sm
0
h
D30 ¼ fP1 , P2 , P3 g
h1 > h2 > h3
with (si,j,tþ1,si,j,tþ2, ..., si,j,tþL) observing Markov properties. Let the sequence of pools Pj,tþL be defined over Gj,tþL, so that Pj, t þ L ¼ Pj, t ∪ ff ðsi, j, t þ 1 Þ, f ðsi, j, t þ 2 Þ, :::, f ðsi, j, t þ L Þg ð20Þ
ð16Þ
The probability measure Hk is used as the0 density function dk. Let Anp0 and Anp be the domains of Dnph and Dnph, respectively.
ð19Þ
Illustration 5: Let the Markov process (L = 3) be gt3 ðsi, j, k Þ ¼ ðsi, j, t þ 1 , si, j, t þ 2 , si, j, t þ 3 Þ ∈ S 5268
k ¼ 1, 2, :::, t
dx.doi.org/10.1021/ie1014603 |Ind. Eng. Chem. Res. 2011, 50, 5266–5278
Industrial & Engineering Chemistry Research
ARTICLE
with si, j, t þ 1 ¼
si, j, k tþ2
si, j, t þ 2 ¼
si, j, t þ 1 tþ3
si, j, t þ 3 ¼
si, j, t þ 2 tþ4
Let the partitions G0 ¼ fG1, 0 , G2, 0 , G3, 0 g with G1, 0 ¼ fx1 g G2, 0 ¼ fx2 g
G3, 0 ¼ fx3 g
Using f, the cascade is Dh3, 0 ¼ fP1, 0 , P2, 0 , P3, 0 g with P1, 0 ¼ ff ðx1 Þg Let
g30
P2, 0 ¼ ff ðx2 Þg
map x1, so that g03
x1 f
x1 x1 x1 , , 2 6 24
P3, 0 ¼ ff ðx3 Þg
G0 evolves to G3, G3 ¼ fG1, 3 , G2, 3 , G3, 3 g with
G1, 3 ¼
x1 x1 x1 x1 , , , 2 6 24
G2, 3 ¼ fx2 g G3, 3 ¼ fx3 g
Accordingly, Dh3, 3 ¼ fP1, 3 , P2, 3 , P3, 3 g with
Figure 1. Cascade optimization algorithm flowchart.
x1 x1 x1 P1, 0 ¼ f ðx1 Þ, f ,f ,f 2 6 24 P2, 0 ¼ ff ðx2 Þg P3, 0 ¼ ff ðx3 Þg
If dz is a profile, Dnph,t will be in equilibrium, with respect to dz, iff lim dðAnp , t Þ ¼ dz
tf¥
ð21Þ
’ DESCRIPTION OF THE ALGORITHM The procedure of the cascade optimization algorithm is illustrated as follows: Step 1: Select a number of pools to formulate a cascade (following Definition 1). Step 2: Use a Markov process to increase populations in partitions and pools. The Markov process maps a point in a partition to new points at each iteration. The new points are placed into the partitions to increase their populations. Step 3: Check whether the cascade follows eq 13 and develops the inflections (following Definition 6), if required. Step 4: Check the convergence of the algorithm and iterate with Step 2. The convergence of the algorithm is determined by the termination criteria that address the controls in • Population size • Optimization progress
• Population feature The details of the termination criteria will be explained in a section below. Figure 1 outlines the main steps of the algorithm. Cascade Optimization Algorithm as a Markov Process. The cascade optimization algorithm employs a Markov process to generate new points. The Markov process is similar to that in SA. A global parameter is used as the pool temperature that accounts for the level of uncertainty in each pool, as is the case of temperature in SA. Each pool j is associated with a temperature (Tj) and pools are sorted in decreasing Tj . If the top (T1) and bottom temperatures (Tnp) are fixed, Tj is calculated based on the following expressions: ! " !# T T T T 1 n 1 n p p Tj f T ðjÞ ¼ jr þ T1 ð22Þ 1 np r 1 np r The parameter r controls the convexity (concavity) of the function that is twice differentiable with its second derivative: ! T1 Tnp r 2 T 00 ð23Þ f ðjÞ ¼ rðr 1Þ j 1 np r With different r, eq 22 can produce different temperature profiles which resemble the cooling schedules employed in SA. These cooling schedules can be classified into three major categories: 00 • With r > 1, fT (j) < 0, the cooling schedules are strictly concave functions. 5269
dx.doi.org/10.1021/ie1014603 |Ind. Eng. Chem. Res. 2011, 50, 5266–5278
Industrial & Engineering Chemistry Research
ARTICLE
to that in SA, the Markov process follows the Metropolis criterion: pt , t þ 1
fi, j, t fi, j, t þ 1 Pðsi, j, t þ 1 jsi, j, t Þ ¼ min 1, exp Tj
!!
ð24Þ
where si,j,t and si,j,tþ1 represent the current point and the new point, respectively; fi,j,t and fi,j,tþ1 represent the objective of si,j,t and si,j,tþ1, respectively; pt,tþ1 is the probability of accepting si,j,tþ1; and Tj is the temperature associated with Pj,t. With this criterion, new points are accepted according to the following rule: ( si, j, t þ 1 if pt , t þ 1 g R ð25Þ si, j, t þ 1 ¼ if pt , t þ 1 < R si, j, t where R is a random number distributed uniformly in [0,1] ∈ R . As is apparent, the probability of accepting a new point is controlled by the pool temperatures. When Tj is large (small), the Markov process follows a high (low) probability and accepts more (less) points. Illustration 7: Let us revisit the description presented in Illustration 6. Let the optimization problem be min f ¼ x2 The cascade at time 0 is Dh4, 0 ¼ fP1, 0 , P2, 0 , P3, 0 , P4, 0 g with
Figure 2. Cooling schedules.
P1, 0 ¼ ff ðx1 Þg ¼ f224g
00
• With r < 1, fT (j) > 0, the cooling schedules are strictly convex functions. 00 • With r = 1, fT (j) = 0, the cooling schedules are linear functions. Figure 2 illustrates the three classes of cooling schedules, where the X-axis denotes the index j and the Y-axis denotes Tj. Illustration 6: To illustrate the assignment of Tj according to the cooling schedule, let us now set up an instance of the algorithm at t = 0, using four partitions and pools: G0 ¼ fG1, 0 , G2, 0 , G3, 0 , G4, 0 g
P3, 0 ¼ ff ðx3 Þg ¼ f81g
g03
x4 f x5 ¼
p0, 1 ¼ 2:3 1016 p0,1 < R and x5 is not accepted. g30 maps x4 again to x6:
G2, 0 ¼ fx2 g
x4 f x6 ¼
x4 ¼ 8
G3, 0 ¼ fx3 g
G4, 0 ¼ fx4 g
Using f, the cascade is
p1, 2 ¼ 1 p1,2 g R and x6 is accepted. g30 then maps x6 to x7: g03
Dh4, 0 ¼ fP1, 0 , P2, 0 , P3, 0 , P4, 0 g Let T1 = 100, T4 = 1, and r = 1. Based on eq 22, each pool is associated with a temperature: T1 ¼ 100
T2 ¼ 67 T3 ¼ 34
8 þ2 ¼ 6 2
and
be distributed in partitions, so that G1, 0 ¼ fx1 g
8 þ 2 ¼ 10 1
Based on eq 24,
g03
x2 ¼ 11 x3 ¼ 9
P4, 0 ¼ ff ðx4 Þg ¼ f64g
g30 first maps x4 to x5:
Let four points, x1 ¼ 12
P2, 0 ¼ ff ðx2 Þg ¼ f121g
T4 ¼ 1
x6 f x7 ¼
6 þ2 ¼ 4 3
and p2, 3 ¼ 1 p2,3 g R and x7 is accepted. A set of points are generated by g30:
Population Growth. The Markov process generates points and increases the population in the partitions and pools. Similar
gt3
x4 fðx6 , x7 Þ 5270
dx.doi.org/10.1021/ie1014603 |Ind. Eng. Chem. Res. 2011, 50, 5266–5278
Industrial & Engineering Chemistry Research
ARTICLE
G0 evolves to G3: G3 ¼ fG1, 3 , G2, 3 , G3, 3 , G4, 3 g with G1, 3 ¼ fx1 g
G2, 3 ¼ fx2 g G3, 3 ¼ fx3 g G4, 3 ¼ fx4 , x6 , x7 g
Using f, the cascade Dh4, 3 ¼ fP1, 3 , P2, 3 , P3, 3 , P4, 3 g with P1, 0 ¼ ff ðx1 Þg ¼ f144g
P2, 0 ¼ ff ðx2 Þg ¼ f121g
P3, 0 ¼ ff ðx3 Þg ¼ f81g P4, 0 ¼ ff ðx4 Þ, f ðx6 Þ, f ðx7 Þg ¼ f64, 36, 16g It is obvious that pt,tþ1 is large (small) when Tj is large (small). T1 should be a large value, so that all points can be accepted and Tnp should be a small value, so that only the points that are better than the current ones are accepted. Development of Inflections. The inflections are developed periodically following Definition 6. To develop the inflections, let and Fmin be the maximum and minimum objectives in pools Fmin t t at time t. Apparently, the np pools have np þ 1 boundaries. Let us assume that the cascade has np þ 1 pools associated with np þ 1 temperatures: min fjmax Tj Tj þ 1 , t fj, t ¼ max T1 Tnp þ 10 Ft Ftmin
Each pool j is associated with two boundaries fjmin ,t
Based on eqs 27 and 28, the boundaries for pools are
Fmax j,t
and
ðFtmax Ftmin ÞðT1 Tj þ 1 Þ þ Ftmax ðT1 Tnp þ 1 Þ
ð28Þ
g f ðsi, j, t Þ g
" j