Ind. Eng. Chem. Res. 1993,32, 2706-2713
2706
Simultaneous Optimization of Chemical Flowshop Sequencing and Topology Using Genetic Algorithms Hugh M.Cartwright' and Robert A. Long Physical Chemistry Laboratory, Oxford Uniuersity, South Parks Road. Oxford, England OX1 3QZ Scheduling heuristics are of considerable practical importance and, accordingly, have been studied using a variety of methods. Such heuristics commonly assume a flowshop of fixed serial topology, but this assumption seriously restricts the theoretical efficiency of the flowshop. In this paper we show that both flowshop chemical feedorder and the topology of the flowshop itself can be simultaneously refined using genetic algorithms. Because of the implicit parallelism of genetic algorithms, the method scales up readily to flowshops of industrial complexity. 1. Introduction
Precursors
Batch processing is an area of central importance to the chemical industry. A high proportion of batch plants operate as chemical flowshops, in which a single set of processingunits (reactors,dryers, centrifuges, etc.) is used in the successive production of a variety of different chemicals (Figure 1). Efficient plant operation requires that chemicals enter the flowshop in an order in which their passage through its chain of units is as close to 'lock step" as possible, and this requirement has been the stimulus for numerous studies on the sequencing of products in a flowshop (see, for example, Karimi and Ku (1988a,b),Kuriyan and Reklaitis (1985), and papers cited therein). In a flowshop (Reklaitis, 1982) there are the following: (a) a set of NC chemicals to be processed; (b) a set of NR processing units available for this purpose; (e) a performance criterion with respect to which the feedstock sequence is to be optimized; usually the total processing time (makespan); (d) a matrix of processing times (Tij) associated with each chemical i and unit;; (e) a set of rules governing storage policy within the flowshop. Studies of this problem can be traced back to early work on a simple list scheduling procedure (Johnson, 1954). the aim of which was the minimization of the makespan for a two-processor problem. Johnson's procedure guarantees optimum solutions when just two units are available. However, the generalized flowshop problem has been shown to be NP-complete (Garey et al., 1976), so the determination of an optimum or near-optimum sequence in flowshops which process more than a small number of chemicals is a challenging problem. Recent investigations have concentrated on the development of tractable heuristics, often relying on Johnson's algorithm to generate a good solution for two-unit approximations to the problem, and then attempting to improve this solution by some recursive strategy. These strategies have included, for example, sequential swapping of adjacent chemicalsin the sequence (Dannenbring 1977), use of an approximate mixed integer linear program (MILP)formulation (Ku and Karimi, 1986).and selective swapping of chemicals to minimize the idle time of each processor (Rajagopalan and Karimi, 1987). These methods yield useful reductions in the makespan, though for systems of industrial complexity they do not guarantee to determine the 'best" chemical feedorder. However, far more dramatic gains may be made if not only the order in which chemicals enter the flowshop can
in
*To whom correspondence should be addressed. Email:
[email protected].
088&5885/93/2632-270$04.00/0
Products
I
Figure 1. Symbolic illri+tration of a small flowshop.
Precursors
Products
in
out
I
Extra U"lt
t
Extra U"lt
Figure 2. Flowshop containing a small number of extra. movable
units. be adjusted, but also the topology of the flowshop itself through the addition of extra processing units (Figure 2). In an industrial flowshop, cost, space, and complexity considerations limit the number of units which could be introduced to operate alongside those in a serial chain, but the addition of a small number of units to increase efficiency is industrially viable. The problem of choosing a suitable flowshoptopology is a t once both more complex than any associated with the serial flowshop shown in Figure 1 and offers more potential for makespan improvement. In practical terms, the problem resolves itself intotheselectionofthe bestpositioningofafinitenumber of additional units and their connections to the serial line, choosing concurrently a near-optimum feedorder for that topology. As we shall show, even a few extra units have the potentialto yield significantsavingsin makespan, provided that a suitable method can be found to optimize their placement. In view of the economic importance to the chemical industry of makespan minimization, we have thereforeinvestigatedhowchemid feedorder and flowline topology might be optimized simultaneously. 2. Genetic Algorithms
In recent work on the scheduling of releases into a computer board manufacturingassembly (aproblem which has features in common with flowshop scheduling), 0 1993 American Chemical Society
Ind. Eng. Chem. Res., Vol. 32. No. 11, 1993 2707 ClevelandandSmith (1989) usedgeneticalgorithms (GAS) toinvestigate theeffect thattheadditionofextraassembly stations in parallel to the main line had on processing efficiency. Their choice of a nonlinear but fixed topology is a rather severe limitation, and if this were applied to scheduling in a flowshop containing extra units, it would reduce that problem to little more than a demanding variant of a serial flowshop calculation. Nevertheless, as their work illustrated, GAS are particularly powerful in scheduling tasks. Indeed, work on serial flowshops shows genetic algorithms have considerable potential in the determination of optimum feedorders (Mott, 1991). and this potential encourages us to investigate the application ofthisintelligentsearchtechnique toflowshopsofvariable topology. ProblemsTo Which theGenetic Algorithmcan Be Applied. Genetic algorithms have developed from seminal work by John Holland (1975). who showed that the incorporation of features of natural selection and evolution in a computer program could lead to a highly efficient and general search technique. Genetic algorithms are now widely used in applications of artificial intelligence (AI), and their power to deal with real-world scientific and engineering problems is increasingly being appreciated. GAS use a 'guided random search", in which many different solutions to a problem are investigated and refined simultaneously to identify near-optimum solutions. Theuseof apopulationofsolutions, rather thanasingle solution, contrasts strongly with more traditional methods of studying flowshop scheduling. This multiplicity of solutions is an essential feature of the GA and speeds the discovery of, and convergence to, good solutions; a t the same timeithelpstoprevent convergenceofthecalculation onto suboptimal solutions in complex problem spaces. Furthermore, it helps to locate and improve promising solutions using a mechanism which, through implicit parallelism, has substantial speed advantages over most competitive methods (Cartwright and Harris, 1993). The algorithm is a very general one, and can be used for a wide variety of complex searches provided that (a) solutions to the problem can be cast in the form of a string of numbers; (b) a 'fitness", which in some way quantifies the quality of a solution, can be derived for any arbitrary string; and (e) strings in which 'part" of a good solution ispresentarerewardedbytheallocationofahigherfitness than strings chosen entirely at random. We can interpret "part" of a good solution to the scheduling problem as a specified subgroup of contiguous feedstock chemicals, whose progress through the flowshop takes place largely in synchronized fashion, or an arrangement of extra processing units which is particularly effectivein reducing the makespan for a defined chemical feedorder. Each of the three features a-c is present in the flowshopscheduling problem,with or without the inclusion of variable topology. When a problem with a CA is tackled, considerable freedom exists in how the algorithm may be implemented, but every calculation involves three essential operations: (1) a method for encoding solutions to the problem in strings of decimal or binary digits, (2) an evaluation function which takesastringas inputand returnsa fitness value which measures the quality of the solution that the string represents; (3) an adaptive plan, whose purpose is to produce a new, improved population of solutions using information provided by the previous generation. This plan consists principally of theevolution-like reproduction. string crossover, and mutation operators.
FiuureJ. Flowhopwith additionalunitsandlinksatatationa2and 3. I'nita 2 and 2a are nonequivalent. since a chemical exiting unit 2 has a choiceof twodestination reartom.while no Jimilar choiceexista for chemicals leaving unit 2a. The makespan for B given feedorder will usually depend upon the rouw taken hy each chemical when it leaves ""It I .
Mathematical treatment of the effect of this adaptive planshows that.overanumberofgenerations,small,highquality sections of strings appear and proliferate in the population, while large, poor-performance sections disappear. High-quality strings are formed through the automatic-bolting together"ofsuperiorsubstrings asthey arise during the algorithmic search. The development of solutions of increasingly high quality is described mathematically by the schema theorem (Coldberg, 1989). 3. The Scheduling Problem
The basic chemical flowshop is a serial line of NR units, which may he reactors,dryers,storageunits,packingunita, and so on. Each unit is connected to its two neighbors, withanentry to the line through the first unit and anexit from the line at the final unit (Figure 1). T o provide the extra flexibility offered by a variable topology, we allowNE additional processing units (NE 5 N R ) to be placed at any position in parallel with the main line, so that each main line unit may be mirrored by a second equivalent unit beside it. [The results reported here do not cover flowshops in which more than one additional unit can be placed a t a station, but recent work has shown that the model can be extended to include multiple extra units (Cartwright 1993j.l Links are installed to connect these extra units to the main line and (optionally) to each other. Further links are available to connect nonadjacent main-line reactors (Figure 3). In this work several assumptions, common in the literature, are made about the nature of the processing plant. 1. The plant operates under NIS (no intermediate storage) policy except forany unit which hasapipeleading out of the flow line, which operates under UIS (unlimited intermediate storage) policy. (The nature of the CA solution to the problem is such that a flowshop operating under UIS policy can be considered without adjustment m the algorithm by introducing storage units which have zero processing times for all chemicals.) 2. The times for setup of the reaction vessel, transfer of product, processing of the chemical, and cleaning the vessel after use are included in a single time; once again modification of the algorithm to explicitly include these individual times is straightforward. 3. A unit may only process one chemical at a time. 4. A chemical may only be processed in one unit at a time. 5. Once started, a unit must complete the processing of a chemical. The number of distinct combinations of feedorder with flowshop topology is
NC! X NR! NE!(NR - NE)! For a system with NR = 20, NE = l0,and NC = 20 there are thus approximately 4.5 X loz3different combinations.
2708 Ind. Eng. Chem. Res., Vol. 32, No. 11,1993 Table I. A T v D i d Pmceasinn Tlmw Matrix
Processor number
It is unlikely purely heuristic methods would be able successfully to cope with a problem of this scale, so an efficient algorithm is required if high-quality solutions are to be identified. 4. Application of the Genetic Algorithm to
Scheduling a n d Topology Processing times matrices in which random processing times lay in the range 0-10 h were used for all runs except where stated. A typical processing times matrix is given in Table I. A GA program to tackle the combined scheduling/ topology optimization problem was devised and run on a Sun 470 computer. In the AI literature, most studies involving GAS report the use of binary-coded strings. However this type of coding is rarely appropriate for scientific problems, and decimal coding both simplifies the operation of the GA and increases the speed of calculation;accordingly,decimal coding was used throughout. The use of expanded alphabets in this way does not violate the implicit parallelism of GAS (Janikow and Michalewicz, 1991). Atthesktofeachruna randomlygenerated population of strings was set up, each string encoding a particular chemical feedorder and topology combination. The feedorder was coded aa a string of numbers which defined the sequence in which batches of chemicals enter the flowshop. Thetopologywascoded bygivingeverypossible
unit a unique identifier, which specified whether the unit waa main line or subsidiary, and to which other units it was linked. A complete string consisted of the combined feedorder and topology information. The value of a string as a potential solution to the problem was found by calculating the makespan of the system represented by the string, using a linear sieve technique. (Thisisa particularly straightforward method of calculating the makespan. Starting from the final unit intheflowshop,eachunitisexaminedinturntodetermine whether a chemical batch is present and, if it is, whether reaction of that chemical is complete. If reaction is finished, the batch is moved to the next unit in the flowshop, if it is available. If reaction is still underway, the time at which it will be complete, T., is recorded. All unitsare examined insequence,finishingwith theentrance unit. If, once this process is complete, the entrance unit is empty, it is filled with the next chemical in the flowshop sequence. The total elapsed processing time is then incremented by the smallest value of.'2 and the process is repeated. This apparently unsophisticated method of determining the makespan is rapid, simple to code, and readily incorporated into the GA program.) Resolutionof Ambiguities. Duringcalculation ofthe makespan, the presence of extra processing units has the potential to introduce ambiguity into the calculation. For example, a chemical may be able to enter two different units which have nonequivalent linksto later reactors (such as units 2 and 2a in Figure 3). The route chosen for the
Ind. Eng. Chem. Res., Vol. 32,No. 11, 1993 2709 chemical may affect the makespan, and may also result in shuffling of the order in which chemicals are processed further down the line. It is important to appreciate that in such circumstances there is no way of knowing which is the “best” processing unit for the batch to enter without determining the makespan for each different route the batch might take. No search tree, for example, could be used to lead us invariably to the correct choice of unit to be used. Such ambiguities as these might be resolved in one of two ways: we could apply an empirical set of rules to handle ambiguities reproducibly as they arise, using experience to guide us to choose rules which, on average, tend to minimizethe makespan. Or, when a string is first assessed, we might resolve the ambiguities by making a random choice of which unit the batch should enter and then retaining within the string itself a record of how the ambiguity was handled, so that the makespan for that string is always calculated subsequently in the same way. This latter method is the more flexible, and is currently the subject of further study (Cartwright, 1993). However, it poses serious computational problems, and for the present work the following empirical rules were applied in determination of the makespan: If a connection exists that allows a batch to bypass a unit in which the batch has zero processing time, that bypass connection is used, provided that the destination unit becomes available at the same time or earlier than the alternative destination. (This rule allows a batch presently in unit ( x - 1) and with zero processing time in unit x to move directly to unit ( x + 1)and start processing, thus bypassing a potential blocking batch in unit x . This has the general effect of making more efficient utilization of units in the flowshop, and thus generally leads to a reduction in makespan). If two destination units at a single station are empty, the main-line unit is selected as the destination when a batch is ready to enter that station. (Since a main-line unit usually has at least as many exits as a subsidiary line, this is in general likely to lead to efficient processing of batches). If batches in two equivalent units are both ready to move into a single destination unit, that in the mainline unit is moved first. (Unlessthenumber ofconnections to non-main-line units is large, a batch in a main-line unit usually has more choices of destination units available to it when it is ready to move on. Because of this, it is usually of benefit to vacate a main-line unit rather than a subsidiary unit, to free it for an incoming batch). Determination of the Fitness. After the makespans are calculated for each string, fitnesses must be derived from them. The success of a GA calculation is intimately related to the discrimination with which the algorithm can choose between solutions of varying effectiveness. It is common, therefore, to apply some form of scaling to the raw fitnesses (in this case the makespans) generated by the algorithm, with the aim of promoting the development of high-quality solutions. We can see how scaling suited to the scheduling problem can be applied by considering its operation with some typical data. Table I1 shows, in column 2, the makespans for a representative selection of strings, several hundred generations into a calculation. Since the fittest strings are those with the lowest makespan, it is reasonable to relate the fitness to the inverse of the makespan (column 3). However, the range of values thus generated is rather small (4.975X 103 to 4.425 X 1 0 3 ,and the range is even smaller
Table 11. Calculation of Fitness from Makespan makespan l/makespan adjusted string (h) x lo” makespan 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
207 204 204 219 201 203 204 203 202 226 203 206 206 203 203 206 206 201 201 203 201 208 201 204
4.831 4.902 4.902 4.566 4.975 4.926 4.902 4.926 4.950 4.425 4.926 4.854 4.854 4.926 4.926 4.854 4.854 4.975 4.976 4.926 4.975 4.808 4.975 4.902
27 24 24 39 21 23 24 23 22 46 23 26 26 23 23 26 26 21 21 23 21 28 21 24
fitness x10-2 3.704 4.167 4.167 2.564 4.762 4.348 4.167 4.348 4.545 2.174 4.34 3.846 3.846 4.348 4.384 3.846 3.846 4.762 4.762 4.348 4.762 3.571 4.762 4.167
among the better strings (4.975X 103 to 4.808 X 103). The genetic algorithm relies upon an ability to select “good” strings on the basis of their fitness. To the GA,each of the good strings in Table I1 seems almost equally promising: they have nearly identical makespan inverses. To the chemical engineer, however, a string with a makespan of 201 h is significantly superior to one with a makespan of 208 h. In order to stretch the range of fitnesses, a value equal to roughly 90% of the estimated optimum makespan is subtracted from the makespan for each string before inversion. (There is nothing magical about this value of 90% ;our experiments suggest that such a value provides about the right degree of encouragement to the algorithm to discard poor solutions and select good ones. If the fitnesses are scaled too strongly by use of too large a value, the fitness range becomes very great; extreme evolutionary pressure on all stringsthen forces the algorithm to converge rapidly, usually to a suboptimal solution. On the other hand, if the scaling is too timid, the algorithm converges only slowly, and very long computational times may be required to reach an acceptable solution. A preliminary run without scaling allows one to estimate the value of the best possible makespan reasonably accurately, and from this value one can move to the full calculation with scaling.) The makespans in Table I1were adjusted by subtraction of 180before inversion to yield a fitness, as shown in column 5. The range of fitness for good strings is now 4.762 X to 3.571 X 1t2, and the algorithm is better able to identify those strings of the highest quality. A reproduction scheme including single elitism was used to create a new population from the old. In this procedure, one copy of the best string in the current generation was made and placed directly into the next generation in a position protected from crossover and mutation. (This “elitist” step ensures that the best string in generation ( n 1)is never worse than the best string in generation (n), and is used to protect against the possibility that the best string might be disrupted by the remaining evolutionary operators and thus inadvertently lost from the population.) The mating (crossover) pool was then filled using stochastic remainder selection. Each string was copied into the mating pool a number of times equal to the integer part of its fitness. All above-averagestrings thus received
+
2710 Ind. Eng. Chem. Res., Vol. 32, No. 11, 1993
a t least one copy in the next generation. Remaining places in the mating pool were allocated using a pseudo-random process in which strings were selected with a probability dependent upon the decimal part of their fitness. Strings in the mating pool were then submitted to crossover and mutation. String Crossover. Chemical feedorder crossover was performed using an adapted version (Mott, 1991) of Goldberg and Lingle’s PMX operator (Goldberg and Lingle, 1985). In most problems tackled with the genetic algorithm, any crossover operation yields valid strings. However, in this problem, strings are easily generated in which two chemicalbatches appear twice and other batches fail to appear a t all. Since every batch must be processed by the flowshop once only, such strings do not represent valid solutions to the processing problem. Goldberg and Lingle proposed a method by which such invalid strings would be adjusted to remove duplicate entries. However, their method tends to be disruptive of strings, and an alternative procedure has been developed which causes less disruption to strings upon crossover. When two strings are selected for crossover, a segment encompassing the same number of batch positions, is selected at random from two strings and the segment exchanged. For example, in the following two strings, the segments to be exchanged are shown in bold face: string A string B
7 2
1
4
9
17 8
9
8
17
16
18 7
6
6 1
14 13
...
...
This leads to the two new strings: stringA stringB
2
16
6
2
9
8
2 JJ
1
8
1 4
18
6
9
JJ
14 13
... ...
in which duplicate batch numbers are underlined. The two new strings are then analyzed from their first batch position. When a duplicate chemicalis found, it is swapped with the first duplicate batch in the second string, stringA stringB
7 2
16 9
6
9
8
1 1
8
4
18 7
6 JJ
14 13
... ...
and the process is continued until all duplicates have been removed. stringA stringB
7 2
16 9
6 8
9 17
1 1
8 4
18 7
17 6
14 13
...
...
Since the number of duplicates in the two strings must be identical, and batch numbers missing from one string must appear twice in the second, this process alwaysyields valid strings, and does so with minimal disruption of the string order. In topology crossover, all extra processing units between two points on one string were swapped with the extra units between the corresponding points on another string. Connections were maintained where possible, and any connections needed to ensure a new unit was not isolated were put into place. The strings and the crossover points were chosen at random, but sections of the strings chosen for crossover were checked to ensure they contained the same numbers of extra reactors, to prevent the generation of strings coding for more than the maximum allowed number of additional units. S t r i n g Mutation. The chemical feedorder was mutated at a low rate by swappingthe positions of two batches in a randomly chosen string. Topology was mutated at a similar rate by removing a single randomly selected extra
270
260 250
~
4
Average makespan
{
;I;;
m
8 U d
230
220
-
210 200
-
190
4
i
I
101
201
301
401
501
601
701
801
901
1001
Generation number
Figure 4. Variation of the average makespan of the strings in a population, and the makespan of the best string in the population, with generation number.
Table 111. GA Parameters Used for Calculations population size 35 reproduction operator stochastic remainder, single-stringelitism string optimization traveling local search crossover rate 0.3 per string per generation mutation rate 0.001 per string position per generation
unit from a string and replacing it in the same string in an arbitrary new position. Connections to and from the previous position of the unit were broken and connections made to the new position. While crossover was always applied to both feedorder and topology of the same string (since a good feedorder soon becomes associated with an approbriate topology), the same restriction was not applied to mutation. There is no a priori reason why mutation of topology and feedorder in different strings should be any less constructive than when the operators are applied to a single string. Settings of Variable Parameters. An important aspect of the use of GAS is the determination of suitable values for the variable parameters that control the operation of the algorithm, such as the population size and crossoverand mutation rates. The rate of convergence of the algorithm, and also to some degree the success of the calculation, depends upon the selection of appropriate values. Althoughthis selection is not trivial, suitable values are related to the complexity and nature of the problem, and the body of GA literature provides a useful guide to approximate values. From repeated trials using randomly chosen processing times and starting strings, the effect of the adjustment of each variable parameter on the success of the calculation can be judged. The parameter set found to be most suited to the problem is shown in table 111. As the calculation proceeds and the GA converges on good solutions, the rate of makespan improvement gradually diminishes (Figure 4). Assistance was provided to the GA at later stages in the calculation by permitting it to engage in a traveling local search for both topology and the batch feedorder improvement. This is a hill-climbing modification which produces improvements in the population by searching through the set of solutions in the immediate vicinity of each string in the problem space. In this local search, the batch order is changed by pairwise swap (Figure 5) until an order is found which leads to an improvement in the makespan. When such an improvement is found, the pairwise swaps are resumed at the start of the updated string, and the process is continued until a cycle is completed without any improvement in makespan. A similar procedure is used for local search of the topology. Traveling local search is a simple and productive addition to the GA search. However, it is comparatively
Ind. Eng. Chem. Res., Vol. 32, No. 11, 1993 2711
8
15
?-l 3 7 13 5 8
15
7
3
13
5
n 1 3 3 7 5 8 1 5
.......
Makespan
I
Figure 6. Number of times a particular makespan waa found in a sample of 100 0o0 randomly chosen feedorder/topologycombinations. The best makespan found by the GA for this system was 197 h.
?T 7
13
7
5
3
5
8
15
8
15
“T
n 13
3
Figure 5. Feedorder pairwise swap procedure wed during a local search. The first chemical is swapped successively with the second, third, ... in a string. If an improved string ia generated, the process repeats, using the new string as the starting point.
time-consuming, and its primary objective is to introduce new genetic material on which the GA can work, once the rate of convergence startato diminish. It is thus an adjunct to the GA, rather than an attempt at independent optimization in its own right.
6. Results and Analysis
Is Simultaneous Optimization Necessary? The addition of extra units to a serial flow l i e will substantially increase the difficulty of finding a near-optimum feedorder only if it is not possible to optimize the feedorder and the topology independently. I t is therefore important to establish whether simultaneous refinement of solutions is necessary, or whether the computationally more straightforward problem of determining a high-quality topology and feedorder independently can be used instead. The most direct way to investigate this is to use the GA to find a near-optimum feedorder for a serial line and then determine the best topology for the addition of extra reactors to the line for this feedorder. The results of this procedure (and the related one in which a topology is optimized for a random starting feedorder and the best feedorder is then found for that topology)can be compared with results yielded by calculations in which both feedorder and topology are optimized simultaneously. A series of experiments was conducted to find the best feedorder for a flowshop with NC = 20,NR = 20,NE = 10and a randomly chosen set of processing times. Results from typical experiments covering three different processing times matrices are given in Table IV. The table indicates clearly the gains to be made if both topology and feedorder are optimized together. Quality of GA Makespans. Since there appear to be no comparable data in the literature to which we can compare the results yielded by the GA, we have performed distribution analysis on a randomly generated set of solutions. The purpose of this analysis is to make an approximate determination of how the results found by the GA compare to typical makespans for randomly selected feedorderltopology combinations. The makespans were calculated for 100 000 random topologylfeedorder combinations for a system in which
04 6
I
8
IO
12
14
16
18
20
System size
Figure 7. Variation of the number of solutions and the time to GA convergence with ‘system size” (see text). Table IV. Comparison of the Best Makespans (in h) Found by Independent and Simultaneous Optimization of Topology and Feedorder for Three Different Data Sets set 1 set2 set3 optimize topology with random feedorder 231 246 260 optimize feedorder with random topology 232 241 260 optimize topology then feedorder 225 233 241 optimize feedorder then topology 225 232 251 optimize feedorder and topology simultaneously 197 201 211
20 chemicals passed through 20 units, with 10 additional units available. The makespans ranged from 226 to 311 h. The frequency distribution of makespans is shown in Figure 6 and is approximately Gaussian in form. A GA optimization for this system, adjusting both feedorder and topology simultaneously, found a best makespan of 197 h. We have been unable to prove the distribution of makespans shown in Figure 6 is Gaussian, but if this is assumed to be a reasonable approximation to the true distribution, the frequency with which makespans of 197 h or better will occur can be estimated. This calculation suggests that a makespan no worse than 197 h will be found once in approximately 10” randomly chosen solutions. The GA located a solution with a makespan of 197 h after generating and assessing roughly 70 000 solutions. The advantage of a GA calculation is clear, and rapidly becomes more pronounced as the size of the problem grows. GA Efficiency. To test the efficiency of the GA, a series of runs was conducted in which the “system size” (defined to be equal to NC, NR, and NEX 2)was increased from 6 to 20. The calculation time to convergence and the number of solutions to the problem are shown as a function of system size in Figure 7. The time taken by the GA to converge increases far more slowly than the increase in the number of solutions. This is primarily due to the implicit parallelism of the
2712 Ind. Eng. Chem. Res., Vol. 32, No. 11,1993 Extra
\/Extra link
Figure 8. Pcasible positions far an extra link on the serial line.
n
*@Figure 9. Two possible links that might he wed by a chemical that has zero proceeaing time in unit8 7 and 8.
CA, and illustrates the application of the algorithm is computat ionally reasonable for even large-scaleproblems. Placement of Topological Features. As we have shown, separate CA optimizat ion of topologyand feedorder isnot a feasihle route to high-qualitysolutions. However, it is conceivable that a suitable topology might first be derived through the use of simple heuristics which rely on assessment of the matrix of processing times, and an appropriatesequencemight then be found forthistopoloRy using standard methods. A series of experiments showed that simple heuristics do not reliably produce good positionings of extra connections, reactors, or storage vessels for the flowshop problem. In the first of these experiments. the number of extra units on a serial line of 20 units was zero, but units were allowed to have a single extra link to a nonadjacent unit (Figure 8). Inspection of a typical randomly generated processing times matrix showed that in several units the processing time of at least one batch (in a total feed of 20) was zero, soalinkthatwould bypassthat unit was potentially usable. One batch was allocated zero processing time in two consecutive units (units 7 and 8)but was the only batch which could bypass unit 7. If an extra forward link was made from unit 6, that link could bypass either unit 7 alone or both units 7 and 8 (Figure 9). Since only one batch could use the extra connection in either case, a reasonable heuristic might be to make the longer link to allow it tomove through the flowshopasquickly aspossible, on the grounds that, if it unnecessarily enters unit 8, it might thereby temporarily block the processing of another batch. When the CA was run to optimize the configuration of this system, the best solution found usually contained a direct link from unit 6 to unit 8 even if a link between units 6 and 9 was explicitly inserted into some starting strings; the best strings never contained the longer link. It wasclear that the presenceofthe longerconnection was actuallycounterproductive. Further trialsconfirmed this wasnot unusual behavior. Thereare(at least) twopossible explanations for this: 1. If a batch bypasses an occupied unit, the processing order is thereby altered. The bypassing of two units will produce a new feedorder which may be (though it need not be) less efficient than the order produced if the batch bypasses only a single unit or none at all. 2. When the batch which can bypass units 7 and 8 completes processing in unit 6. there may be batches resident in units 7 and 9. If unit 8 is empty and another batch is waiting to enter unit 6, a connection to unit 8 will allow this to be used as a storage vessel for the batch
currently in unit 6, thus allowing the batch in unit 5 to move into unit 6 and begin processing. In our consideration of simple heuristics, we also considered the possibility that the best placing of a fixed number of extra units might be determined by examining the processing times matrix and placing extra units in parallel with units in the serial line in which chemicals have the largest individual, or highest average, processing times. However, the CA shows that these are poor heuristics for unit placement. Toillustrate this,theprocessingtimeamatrixwasaltered by increasing a small number of randomly chosen processing times to 11 (a larger value than any other in the matrix). Extra units were placed in parallel with the units a t which the increased processing times occurred. The GA was then run to optimize the feedorder, without changingthetopology. Theresults werecomparedtothose found when theGAwasallowed toalter thetopologyduring the course of the calculation. With fixed topology, the best makespan found was 225 h; by making alterations to the original topology, the makespan could be reduced to 219 h. Significantly, the majority of the extra units were positioned differently in the final and starting topology. Repeated experiments show this to be typical behavior. We have also investigated the placement of extra units in parallel with thoseunits in whichtheaverage processing times were greatest. GA experiments suggest that such a topology is little better than one chosen a t random. It is unlikelythereforethatoptimum placement of extra units can be derived solely from simple heuristics such as these, which rely upon inspection of the processing times matrix. Evidently the temporary blockages in the flow line which extra units help to overcome are not necessarily caused by chemicals in the reactors with the largest processing times. Blockages may start near the end of the line as small holdups, but the effect may be magnified as the holdup propagates backward through the flowshop. Consequently, for a system of even moderate complexity, the positions of major blockages are not easily predicted. The positioning of storage vessels has similarly been shown not to be amenable to simple heuristics. For example, CA experiments show that the optimum position for a single storage vessel is not necessarily immediately before the unit with the largest, or highest average, processing time. This also is the case when a limited number of storage units are available. 6. Conclusion
The optimization of both chemical feedorder and topology in a chemical flowshop is a complex problem. Attempts to optimize feedorder and topology independently lead to solutions well removed from the optimum. A genetic algorithm provides a fast and reliable method of tackling this problem and demonstrates that a combination of simple heuristics withconventional sequencing approaches is unlikely to be effective in finding nearoptimum feedorderltopology combinations. Acknowledgment We are grateful to Sun Microsystems for their support of research a t the Physical Chemistry Laboratory, Oxford University, and to the referees for their constructive comments.
Literature Cited Cartwight. H.M.Oxford University, England. Unpublished work, 1993.
Ind. Eng. Chem. Res., Vol. 32, No. 11, 1993 2713 Cartwright, H. M.; Harris, S. P. Analysis of the Distribution of Airborne Pollution using Genetic Algorithms. Atmos. Environ., 1993, in press. Cleveland, D. A,; Smith, F. S. In Proceedings of the Third Znternational Conference on Genetic Akorithms: Kaufmann: . Morgan San Mateo, CA, 1989. Dannenbring, D.G. An Evaluation of Flowshop Sequencing Heuristics. Manage. Sci. 1977,23, 1174-1182. Garey, M. R.; Johnson, D. S.; Sethi, R. Complexity of Flow Shop and Job Shop Scheduling. Math. Oper. Res. 1976,1,117-129. Goldberg, D. E. Genetic Algorithm in Search, Optimisation and Machine Learning; Addison-Wesley: Reading, MA, 1989 Goldberg, D. E.; Lingle, R. Alleles, Loci and the Travelling Salesman Problem. Proceedings of the First International Conference on Genetic Algorithm; Morgan Kaufmann: 1985. Holland, J. H. Adaptation in Natural and Artificial System; University of Michigan Press: Ann Arbor, MI, 1975. Janikow, C. 2.;Michalewicz, Z.An Experimental Comparison of Binary and Floating Point Representations in Genetic Algorithms. In Proceedings of the Fourth International Conference on Genetic Algorithm; Morgan Kaufmann: San Mateo, CA, 1991. Johnson, S. M. Optimal Two and Three Stage Production Schedules with Set-up Times Included. Naval Res. Logist. Q. 1954, 1, 61. Karimi, I. A,; Ku, H. M. AModified Heuristic for an Initial Sequence in Flowshop Scheduling. Ind. Eng. Chem. Res. 1988a, 27,16541658.
Karimi, I. A.; Ku, H. M. Scheduling in Serial Multiproduct Batch Processes with Finite Interstage Storage: A Mixed Integer Linear Program Formulation. Znd. Eng. Chem. Res. 198813, 27, 18401848. Ku, H. M.; Karimi, I. A. Scheduling in Multistage Serial Batch Processes with Finite Intermediate Storage, Part 1: MILP Formulation. Presented at the AIChE Annual Meeting, Miami, 1986. Kuriyan, K.; Reklaitis, G. V. Approximate Scheduling Algorithms for Network Flowshops. Znd. Chem. Eng. Symp. Ser. 1985,92,79. Mott, G. F. Optimising Flowshop Scheduling Through Adaptive Genetic Algorithms. Chemistry Part 11Thesis, Oxford University, 1991. Rajagopalan, D.; Karimi, I. A. Completion Times in Serial Mixedstorage Multiproduct Processes with Transfer and Set-up Times. Comput. Chem. Eng. 1989, 13, 175-186,. Reklaitis, G. V. AZCHE Symp. Ser. 1982, 78 (No.214), 119-133.
Receiued for review December 7, 1992 Revised manuscript received July 1, 1993 Accepted July 14, 1993' e Abstract published in Advance ACS Abstracts, October 1, 1993.