Article pubs.acs.org/IECR
Large-Scale Refinery Crude Oil Scheduling by Integrating Graph Representation and Genetic Algorithm Manojkumar Ramteke†,§ and Rajagopalan Srinivasan*,†,‡ †
Institute of Chemical and Engineering Sciences, Agency for Science, Technology & Research (A*STAR), 1 Pesek Road, Jurong Island, Singapore 627833 ‡ Department of Chemical & Biomolecular Engineering, National University of Singapore, 4 Engineering Drive 4, Singapore 117576 S Supporting Information *
ABSTRACT: Scheduling is widely studied in process systems engineering and is typically solved using mathematical programming. Although popular for many other optimization problems, evolutionary algorithms have not found wide applicability in such combinatorial optimization problems with large numbers of variables and constraints. Here we demonstrate that scheduling problems that involve a process network of units and streams have a graph structure which can be exploited to offer a sparse problem representation that enables efficient stochastic optimization. In the proposed structure adapted genetic algorithm, SAGA, only the subgraph of the process network that is active in any period is explicitly represented in the chromosome. This leads to a significant reduction in the representation, but additionally, most constraints can be enforced without the need for a penalty function. The resulting benefits in terms of improved search quality and computational performance are established by studying 24 different crude oil operations scheduling problems from the literature.
1. INTRODUCTION Scheduling of real-life networks is a common and important problem. Examples from the chemical process industry include batch processes, polymer grades, and refinery operations. Conventionally, these problems are modeled using mathematical programming techniques1−6 as reviewed by Mendez et al.7 and Floudas and Gounaris8 and commonly solved using deterministic optimization methods. Despite their widespread use, deterministic methods have drawbacks in terms of robustness and convergence when problem size becomes large, multiple objectives are involved, or multisolutions are needed. Further, the presence of nonlinearities makes solution difficult. Metaheuristic methods,9−13 such as genetic algorithms (GAs), are widely recognized as a practical approach for overcoming these challenges. While they do not guarantee a global optimum solution, their main attraction arises from their ability to obtain good quality solutions even for large-scale nonlinear problems within reasonable computational times. While GAs have been widely used for scheduling of discrete10,11 and batch14−16 production, their application to continuous processes has not received as much attention. This is probably due to the fact that these problems commonly involve large numbers of continuous variables and constraints. When handling continuous variables, conventional GAs do not take advantage of the efficiency offered by linear programming (LP) solvers although researchers have used evolutionary algorithms in hybrid forms with LP solvers17−21 and heuristics.22−25 Also, although some studies have reported techniques for a priori constraint handling,26,27 GAs are not inherently suited to handle large numbers of constraints. In this paper, we propose a graph-based chromosome representation, called structure adapted GA (SAGA), which enables a sparse formulation and hence computational efficiency for solving large-scale continuous process scheduling problems. The rest of the paper is organized as follows: Next, a © 2012 American Chemical Society
brief overview of conventional GA is presented with an illustrative scheduling example. The proposed sparse representation relies on a graph-based template of the process both to formulate the problem and to represent the solution in the chromosome. Key concepts from graph theory are summarized in section 2. The graph-based sparse representation of schedules and scheduling problems is also presented there. Section 3 describes how this representation can be incorporated into a GA through suitable design of chromosomes and genetic operators. This leads to a significant parsimony in representation in terms of both the number of variables and the number of constraints to depict the problem. The resulting reduction in problem size and solution speed is demonstrated through a number of refinery crude oil scheduling problems in section 4. 1.1. Conventional GA. GA is a popular13 evolutionary optimization technique based on the biological principles of evolution and genetics. In conventional GA, all the problem variables are mapped one-to-one onto GA variables. The complete set of GA variables constitutes a chromosome. A chromosome can be represented in either binary or real form. During execution of the GA, a population of Np chromosomes is first generated randomly (first generation). In these chromosomes, variables are assigned random values within their allowed bounds. The values of the variables are used to calculate the objective function(s) value(s) as per the model equations (i.e., model evaluation). The objective functions are then converted to a fitness function which serves as an indicator of the suitability of each chromosome as a solution to the given problem. In a variant called the “nondominated sorting genetic algorithm”, Received: Revised: Accepted: Published: 5256
June 16, 2011 March 5, 2012 March 8, 2012 March 8, 2012 dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
NSGA-II, that is used for multiobjective optimization problems, chromosomes are further classified into “ranks” or “fronts” based on the concept of nondomination.10,11 Also, the Euclidean distance in the fitness-function space between nearest neighbors, referred to as the “crowding distance”, is evaluated for all the chromosomes. Two chromosomes are picked randomly from the population of parent chromosomes, and the better of the two, based on rank and the crowding distance, is copied into a mating pool. This is known as tournament selection. Np parent chromosomes are thus copied for future operations (reproduction). This procedure may lead to some inferior chromosomes also finding their way into the mating pool; this is a characteristic of GA, since it is possible that a poor chromosome may give a good daughter. Two chromosomes are picked randomly from these Np parents in the mating pool to undergo genetic operations―such as crossover, mutation, and jumping gene operation12,13―to produce two daughter chromosomes. The Np daughters so produced are mixed with the Np parents. These 2Np chromosomes are re-ranked, and the best Np chromosomes are selected from these. The process of picking the final Np chromosomes from the (2Np) chromosomes is called elitism. These Np chromosomes are then used as the starting population for the next generation. This process continues for a userspecified number of generations, maxgen, after which the Pareto front is not expected to change much. In conventional GA, constraints are handled using penalty functions. Penalties are negative for maximization and positive for minimization problems. In case of constraint violation, these penalty functions are added (with appropriate sign) to the fitness function values. Two types of penalties can be differentiated. Hard penalties add a fixed (large) number for any constraint violation, whereas bracketed penalty functions add a value weighted by the extent of constraint violation. The fitness function seeks to produce a promising search direction, but this effect is significantly diluted by the penalty functions. This makes GA inefficient when there are a large number of constraints in the problem. Further, in conventional GA, the chromosome represents all the problem variables even if most of these have to be necessarily zero due to constraints. This leads to a large chromosome size which also adversely affects search efficiency. Next, we illustrate how a conventional GA is used for scheduling optimization using a simple example. 1.2. Illustrative Problem. A general scheduling problem is shown in Figure 1. A plant receives raw material by ships. In
(a) At any time a parcel can be unloaded into at most one silo. (b) A silo that is being filled cannot simultaneously feed a PU. (c) The maximum capacity of each silo of 300 units should not be exceeded. (d) The supply of material to a PU should not stop. (e) Maximum and minimum rates of charging from silos to PU are 100 and 50 units/period, respectively. (f) Parcels are unloaded in a prespecified sequence with Parcel 1 preceding Parcel 2. (g) All parcels should be unloaded by the end of the time horizon. The objective function is to maximize profit, which is defined as the sum of processing margins over the entire scheduling horizon. Margins are the net surpluses from processing each type of raw material. In general, scheduling problems are formulated in either discrete or continuous time. In the present study, we use a discrete time formulation, in which the entire scheduling horizon is divided into a number of equal intervals, called periods. Flow characteristics such as flow rate, type of fluid, and concentration of material remain constant within each period. In conventional GA, first the decision variables are identified. In this example, during each period, the PUs operate at steady state. Hence, there are four variables (I1,1, I1,2, I2,1, I2,2) representing inflows to the silos, where the first subscript represents the silo and the second the parcel, and two variables for outflows from the silos (O1, O2). These six variables pertain to each of the three periods; thus there are a total of 18 variables over the scheduling horizon. To make time explicit, we use another index to represent the period. Thus the set of variables is {I1,1,1, I1,2,1, I2,1,1, I2,2,1, O1,1, O2,1, I1,1,2, I1,2,2, I2,1,2, I2,2,2, O1,2, O2,2, I1,1,3, I1,2,3, I2,1,3, I2,2,3, O1,3, O2,3}. The flow from the PU to the warehouse can be calculated from O1 and O2 (through mass balance) and is hence not represented as a separate variable. In conventional GA, the 18 variables are represented in the chromosome. The values of the two outflow variables determine profit. T=3
profit =
∑
(m1O1, t + m2O2, t )
t=1
(1)
Here, margins m1 and m2 are the surpluses per unit of P1 and P2, respectively. The operating rules described above act as constraints on the schedule and thus on the values that the 18 variables can be assigned. In traditional GA, these constraints have to be reflected through penalty functions. Consider the first operating rule that a parcel can be unloaded into at most one silo during each period. Its violation at each period t can be detected and penalized with a large penalty MH by
Figure 1. Illustrative scheduling problem.
2 ⎧ ⎪ M if ∑ I s , p , t − max(I1, p , t , I2, p , t ) ⎪ H a s=1 gt = ⎨ ⎪ > 0, where p = 1, 2 ⎪ ⎩0 otherwise
this example, a ship brings two grades of raw materials, parcels P1 (100 units) and P2 (150 units), which have to be unloaded into two silos. The materials from these silos are then charged to a processing unit (PU). The scheduling horizon is set to three periods (to illustrate the concepts graphically). The following are the operating rules in this problem: 5257
(2a)
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
called the graph’s size. A subgraph Gs = (Es, Vs) of G is a graph where Vs ⊆ V and Es ⊆ E. Sometimes a direction is associated with the edges. Such graphs are called directed graphs or digraphs in which any edge em is identified with an ordered pair (vi, vj) of vertices. A vertex containing all outward directed edges is called a source, while one containing all inward directed edges is called a sink. In a directed graph without cycles, a depth can be assigned to each vertex and is the minimum number of edges to reach a source. Every source vertex has a depth of 0. Consider a graph with directed edges (vj, vi) and (vk, vi). The depth dvi of vertex vi is given by
Operating rule b requires that each silo should have either an inflow or an outflow during a given period. The corresponding penalty gtb is given by ⎧ 2 ⎪ M if ⎪ ∑ Is , p , tOs , t > 0, where s = 1, 2 H gtb = ⎨ p=1 ⎪ ⎪ ⎩0 otherwise (2b)
Next, consider the inventory balance of each silo. 2
Vs , t = Vs , t − 1 +
∑
dvi = min(dvj, dvk) + 1
Is , p , t − Os , t
p=1
where dvj and dvk are the depths of vertices vj and vk, respectively. 2.1. Process Operation as a Graph. Many processes can be considered as a network of production units and therefore represented through graphs. In this representation, process equipment including storage tanks is denoted by vertices. Connections between the equipment through which materials flow are denoted by edges. The process network of the illustrative problem shown in Figure 1 is represented by the graph in Figure 2. It consists of five vertices V = {Parcel 1, Parcel 2,
Here Vs,t is the inventory in silo s at period t and Vs,0 is the initial inventory is known a priori. Operating rule c can be translated to a bracketed penalty gtc with a factor MB given by ⎧0 if 0 ≤ Vs , t ≤ 300, ⎪ ⎪ where s = 1, 2 gtc = ⎨ ⎪ ⎛ Vs , t ⎞2 ⎪ − M 1 ⎜ ⎟ otherwise ⎪ B⎝ 300 ⎠ ⎩
(4)
(2c)
Operating rule d requires at least one inflow stream to each PU; i.e., at least one outflow from the silos should be nonzero. ⎧ 2 ⎪ ⎪ M if Os , t = 0 ∑ H gtd = ⎨ s=1 ⎪ ⎪ ⎩0 otherwise
(2d)
Similarly, the other three operating rules can also be converted to penalty functions, with rules a−f requiring a penalty function for each period and a single constraint for rule g. The fitness of each chromosome is then calculated based on the corresponding schedule’s profit and constraint violations as T=3
fitness = profit −
∑
Figure 2. Graph representation of illustrative scheduling problem.
Silo 1, Silo 2, PU} and six directed edges E = {e1, e2, e3, e4, e5, e6}. The depth of each vertex is shown in parentheses. Process operations problems such as scheduling have to account for the network structure of the underlying process; hence graphs serve as a suitable representation scheme.29−33 Further, graph theory has been classically applied28 to flow network problems. In these, associated with each directed edge em is a non-negative capacity cm as well as a f low f m. All nodes except sources and sinks operate at steady state where the sum of the inflows into the node equals the sum of the outflows. The objective is to find the maximum flow from the sources in the network to the sinks while ensuring that the constraint f m ≤ cm is satisfied by each edge. Process scheduling shares many similarities with the flow network problem but requires some further generalization. Specifically, equipment does not necessarily have to be at steady state throughout the scheduling horizon; i.e., the vertices can have time-varying flows as well as material accumulation. The inventory in each vertex is restricted by its storage capacity (Si). Also, the flow through the edges does not have to be constant and could be time varying. Finally, the objective of operation is to maximize not the flow through the network but an economic function that may not necessarily require maximum flow. Despite
(gta + gtb + gtc + gtd + gte + gtf ) − g g
t=1
(3)
The conventional GA formulation above required 18 variables and 19 penalty functions. Such formulations would in general have inferior search performance for large-scale problems. However, structural insights can be used to significantly reduce the chromosome size and the necessary penalties, thus improving GA’s efficiency, as described next.
2. GRAPH REPRESENTATION OF SCHEDULING PROBLEMS A graph28 is an abstract representation of a set of objects where some pairs of the objects are connected by links. The objects are represented by mathematical abstractions called vertices, and the links are represented by edges. Typically, a graph is depicted diagrammatically as a set of circles for the vertices, connected by lines for the edges. Mathematically, a graph G = (V, E) consists of the set of vertices V = {v1, v2, v3, ..., vi, vj, vk, ...} and set of edges E = {e1, e2,e3, ..., em, ...} such that em is identified with an unordered pair (vi, vj) of vertices. The number of vertices, |V|, is called the order of the graph, while the number of edges, |E|, is 5258
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
from Silo 2 during this period. The absence of the other edges from the process network in Figure 2 in this subgraph denotes that there are no other flows during this period. The weight of I1,1 to edge e1 in Figure 3 also shows that the flow rate into Silo 1 is I1,1. Similar structural subgraphs for the other two periods are also shown in Figure 4. Such a period-wise compilation of weighted subgraphs fully depicts the schedule. 2.3. Scheduling Constraints. The graph-based representation of the schedule described above enables the classification of common constraints in scheduling problems along two dimensions: (i) structure and weight, and (ii) intraperiod and interperiod. Some constraints are associated with the edges present in the structural subgraph, and their violations can be directly identified without knowledge of the corresponding weights. Such constraints are called structural constraints. Consider operating rule a in the illustrative problem that prohibits the simultaneous unloading of any parcel into more than one silo. This translates to a constraint on edges e1−e4 from the process network in Figure 2. Specifically, this precludes the presence of e1 along with e3 in any structural subgraph; also, e2 and e4 cannot be simultaneously present. Similarly, operating rule b restricts the simultaneous presence of inflow and outflow edges from a silo. This means that if e1 and/or e2 is present in a structural subgraph, e5 cannot be present, and vice versa. Similarly for Silo 2, the presence of edge e6 prohibits e3 and e4 and vice versa. The structural constraints effect restrictions on the edges in the subgraphs and hence the size of the subgraphs. They can therefore be used to estimate an upper bound on the size of any feasible subgraph. First, we note that, in the absence of any scheduling constraints, the size of every structural subgraph would be equal to the size of the process network from which it is derived. Next, structural constraints relate to some subsets of the edges in the process network. For instance, in the process network in Figure 2, operating rule a relates to the subsets {e1, e3} and {e2, e4} while operating rule b relates to {e1, e2, e5} and {e3, e4, e6} as explained above. Finally, it is evident that the upper bound of the size of any graph can be derived from those of its disjoint subsets through linear additivity. For instance
the above differences from the flow network problem, the graph representation offers several insights that are useful in scheduling. 2.2. Schedule as a Graph. A schedule can be represented by time-varying graphs. A continuous time representation of a schedule would translate to a process network with a continuous f m associated with each edge, subject to f m ∈ [0 cm], and the discrete time one to edges whose flows f m are piecewise constant. We adopt discrete-time representation in this work but follow an alternate scheme to explicate the structure, whereby the schedule is associated not with one graph but a set of graphs―one graph for each time period. The schedule establishes a subset of the flows in the process network; i.e., it assigns zero flow to some edges. A subgraph can be formed where edges with zero flow have been eliminated from the process graph. We call such a subgraph a structural subgraph of the process network. To capture the complete information in the process schedule, the flow rate at each edge also has to be captured. We indicate the flow rates as the edge weights of the structural subgraphs. The complete schedule is then equivalent to an ordered set of structural subgraphs with weights. Consider the Gantt chart in Figure 3, which is one possible schedule for the illustrative problem in section 1.2. It can be
Figure 3. Gantt chart of a typical schedule for the illustrative problem.
|{e1, e2 , e3 , e4 , e5 , e6}| ≤ |{e1, e3}| + |{e2 , e4}| + |{e5 , e6}| =1+1+2=4 (5)
Therefore, any feasible subgraph in the illustrative problem would have a size of at most 4. In this particular example, complete enumeration of all feasible subgraphs would reveal a lower maximum size with at most three edges. This arises due to interaction between operating rules a and b with nondisjoint subsets. However, as an approximation, a fairly tight bound on the maximum size can be obtained even without enumeration of the effect of all possible interactions between constraints. Thus the structural constraints lead to an upper bound on the size of feasible subgraphs which can be exploited to derive a sparse representation of the problem. Constraints that relate to inventories and flow rates in the process would restrict the value of the vertex weights and edge flows. We call such constraints weight constraints. Operating rule c specifies a silo capacity of 300 units, and translates to an equivalent constraint on the inventory in the silo vertices. Similarly, rule e requires that the flow rates to each PU must be between 50 and 100 units/period, and restricts the flows to this
Figure 4. Graph representation of the schedule in Figure 3.
equivalently represented by the set of graphs shown in Figure 4. The three structural subgraphs in Figure 4 correspond to the three periods in the scheduling horizon. The structural schedule for Period 1 contains edge e1 indicating that Parcel 1 is unloaded to Silo 1 and edge e6 indicating that the PU is charged 5259
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
a subgraph are explicitly listed. Any edge that is not listed in the structural subchromosome is considered to be absent in the corresponding structural subgraph. The number of edges present in various subgraphs within a schedule and across different schedules can vary. The length of the structural chromosome would therefore also be variable, which would complicate the genetic operators. To obviate this, we use a fixed length encoding for each subgraph (i.e., for each period) and thus for the entire structural subchromosome. The maximum size of any possible subgraph, C, is calculated from the structural intraperiod constraints as explained in section 2.3. The total number of slots in a structural subchromosome is given by the product of C and the number of periods, T. The structural subchromosome for the schedule in Figure 4 is shown in Figure 6. It has 12 slots in total, with four slots
range if the edge is present in a subgraph. Weight constraints can be univariate or multivariate. The former relate solely to one vertex or edge, for instance operating rule e, while the latter are associated with multiple elements of the graph. Weight constraints are not explicitly depicted in the graph representation. An alternate classification of constraints is based on their temporal nature. Some constraints relate to each period individually and have no bearing on other (earlier or subsequent) periods. We call these intraperiod constraints. Consider operating rule d which requires that flow to the PU should persist throughout the scheduling horizon. This requires that, in each structural subgraph, either e5 or e6 should be present. The requirements on the structural subgraphs from operating rules a and b as discussed above are also applicable for each period individually and hence are intraperiod. Interperiod constraints arise from temporally dependent operating rules. For instance, operating rule f dictates that edge e1 and/or e3 occurs in periods preceding subgraphs containing edges e2 and e4. Operating rule g dictates that all parcels should be unloaded within the scheduling horizon. This means that, among all the subgraphs that form the schedule, e1 and/or e3 has to be present in at least one and similarly for e2 and/or e4. Thus, constraints that arise in scheduling can be captured in the proposed representation. In section 3, we propose a genetic algorithm whose chromosomes directly encode the schedule using this graph-based sparse representation.
Figure 6. Structural subchromosome for schedule in Figure 3.
reserved for each period. In Period 1, the first two slots are occupied by edges e1 and e6 while the third and fourth slots are blank as there are only two edges in this subgraph. Similarly, the slots for other periods are also filled. Conventional GA operators would not account for the graph underlying each structural subchromosome. We have therefore developed special-purpose operators that extend the basic principles of crossover and mutation to graphs. In structural crossover, the ordered set of subgraphs in each schedule is bifurcated at a crossover site (time period). The subgraphs after the crossover site are swapped between the two parents to produce two new daughter schedules. Figure 7a illustrates the case where parent structural schedules PS1 and PS2 are crossed at time period 3 to give two daughter structural schedules DS1 and DS2. In SAGA, two new structural subchromosomes are created by interchanging the parts of the parent chromosomes, with the front end of the first parent chromosome merged with the back end of the second to form a daughter. Figure 7b shows the two parent structural subchromosomes P1 and P2 being crossed at Period 3 to yield two daughter structural subchromosomes D1 and D2. The crossover site in the chromosome is between periods; therefore, the crossover operator ensures that the length of each daughter subchromosome, and thus the original encoding, is preserved. In structural mutation, a random edge in one subgraph of the schedule is stochastically mutated―deleted, replaced, or added. Specifically, if an edge in the process network is already present in the subgraph, it may be removed or replaced with another edge sharing one of the vertices. On the other hand, if an edge in the process network is not present in the subgraph, it may be stochastically added. Figure 8a illustrates the case where a subgraph of DS2 is mutated by replacing edge e1 with e3. Figure 8b shows the mutation of a structural subchromosome D2 to produce DM2. The structural crossover operator described above ensures that all intraperiod constraints are satisfied; however, interperiod structural constraints could be violated by the resultant daughters. In Figure 7, among the two daughter schedules, DS1
3. STRUCTURE ADAPTED GA The structure adapted GA, SAGA, is an adaptation of the conventional GA (specifically, NSGA-II) that represents feasible schedules using structural subgraphs with weights. In SAGA, each chromosome (as shown in Figure 5) consists of two parts,
Figure 5. SAGA chromosome.
one encoding the schedule structure and the other encoding the weights. These are called structural and weight subchromosomes and are explained in detail next. 3.1. Structural Subchromosome. The structural subchromosome encodes the ordered collection of structural subgraphs that represents the schedule. Graphs can be encoded in GA chromosomes in various ways. In the simplest approach, the process network can be used as a template for encoding. An edge that is present in a given subgraph can be denoted by a binary “1”, and any that are absent can be denoted by a binary “0”. This simple approach is not memory efficient especially for large problems since a large number of edges in the process network are not present in most subgraphs. Therefore, we adopt a sparser representation where the set of edges present in 5260
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
Figure 7. Structural crossover between two parent schedules at Period 3 (a) in structural subgraphs and (b) its representation in structural subchromosomes.
is structurally feasible while DS2 violates operating rule g and is infeasible. Also, structural mutation preserves neither intra- nor interperiod constraints as is evident in Figure 8, where DM2 violates operating rules b and g and thus has both intra- and interperiod infeasibility. Feasibilities of all daughter subchromosomes produced after crossover and mutation are therefore evaluated and repair is undertaken if they violate any structural constraints. Correction is performed by adding, deleting, or replacing edges to make the schedule structurally feasible. This is performed in a depth- and period-wise order wherein the outward directed edges of each vertex, starting from sources, are evaluated for structural constraint violation. Edges in all source vertices (which have a depth of 0) in the subgraph for Period 1 are repaired as necessary; this is then repeated for source vertices in Period 2 and so on until Period T. Thereafter, edges of vertices at depth 1 are repaired in a similar fashion. The process is completed when the sink vertices of the subgraph for Period T have been reached. Note that correction of a given vertex may lead to new infeasibilities in vertices at higher depth. This depth-wise sequence of correction ensures that any induced infeasibilities would also be automatically detected and rectified. However, due to this, deeper edges change more frequently since they could be affected when an edge at lower
Figure 8. Structural mutation of edge e1 to edge e3 (a) in structural subgraph and (b) its representation in structural subchromosome. 5261
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
depth is mutated. To account for this effect, the probability of mutation is made depth dependent with deeper vertices having a lower structural mutation rate. In the illustrative problem, the structural subchromosome DM2 produced after structural crossover and mutation has two infeasibilities: (1) violation of operating rule b in Period 2 as Silo 2 has both inflow and outflow edges, and (2) violation of operating rule g in Period 3 as there is no outward edge from the Parcel 2 vertex in the entire scheduling horizon. The former is associated with a vertex at depth 1 and the latter is associated with a depth 0 vertex. Following the above procedure, the infeasibility at depth 0 is repaired first. Figure 9a shows this in
Figure 10. Intraperiod correction by replacing edge e6 by e5 (a) in subgraph and (b) its representation in structural subchromosome.
used for the weight subchromosome. The crossover site is selected to be the same in both structural and weight subchromosomes; no such restriction is necessary for the mutation site. Correction is not required for this subchromosome since a dynamic bound generation is used during decoding to ensure that all univariate weight constraints are satisfied by every chromosome. This dynamic bound generation is explained in detail next. Let am,t ∈ [0 1] be the value in the weight subchromosome corresponding to the mth edge in period t. The flow of the edge f m,t is then given by
Figure 9. Interperiod correction by inserting edge e4 (a) in subgraph and (b) its representation in structural subchromosome.
fm , t = am , t (UBm , t − LBm , t ) + LBm , t
the corresponding subgraph of DM2 by addition of edge e4 in order to ensure that Parcel 2 is unloaded at least once in the entire scheduling horizon. Figure 9b shows the same correction in chromosomal form―edge e4 has been added to DMC2. This correction induces a new infeasibility since e4 and e6 are now present simultaneously in Period 3, which violates operating rule b. In the next step, both violations at depth 1 are removed. Figure 10a shows the correction by replacing edge e6 by e5 in Period 2 first and then in Period 3 to eliminate both infeasibilities. These changes are shown in chromosomal form in Figure 10b and lead to a completely feasible chromosome DMCC2. The feasibility of the structural subchromosome is thus ensured by the correction operator, which is used during generation of both the chromosomes in the first generation and the daughter chromosomes arising from the genetic operators. 3.2. Weight Subchromosomes. The weight subchromosome complements the structural chromosome with additional schedule related information: specifically it provides the weights for all the edges that are present in the structural subchromosome. The size of the weight subchromosome is the same as that of the structural subchromosome (C·T); however, its contents are numeric values in the range [0 1]. The conventional genetic operators for bit-wise crossover and mutation are
(6)
where UBm,t and LBm,t are the upper and lower bounds of flow through that edge. Prespecified fixed values ( f mmax and f mmin) can be used for nodes when the bounds arise from hardware limitations or operational constraints. For instance, operating rule e in the illustrative problem leads to UB5,t = f5max = 100, UB6,t = f6max = 100, LB5,t = f5min = 50, and LB6,t = f6min = 50. Vertices with inventory holdups have additional mass balance constraints that lead to further limitations on the inflows and outflows. Timevarying or dynamic bounds are necessary for such edges. To calculate the relevant bounds, inventories of the vertices are calculated first depth-wise in each subgraph and then periodwise across the time horizon. The inventory Vk,t of the kth vertex at period t is given as Vk , t = Vk , t − 1 +
∑ Ik − ∑ Ok
(7)
where ∑Ik and ∑Ok are the summations of the inflows and outflows from the vertex and Vk,0 is the initial inventory. When multiple edges are simultaneously active for a vertex, the values of their flows interact, which leads to a multivariate constraint on each of the edges. This can be addressed in the general case by assigning all their flows simultaneously. In the special case 5262
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
limitation f6max (=100) and f 6max (=50), respectively. Using these and the stochastic weight of 0.4
where a vertex has one active edge, univariate bounds result and can be calculated explicitly: UBm , t = min(fmmax , |Vi , t − 1 − Vimin|)
(8)
LBm , t = fmmin
(9)
f6,1 = 0.40(100 − 50) + 50 = 70
In this case, fortuitously no other edges are active; hence this flow assignment would not lead to any mass balance violations, but composition balance constraints could be affected. Therefore a penalty function is required as explained in section 4.1. In Period 2, source edge e3 is active. Using dynamic bounds, the stochastic weight of 0.3 is converted to f 3,2 as
In SAGA, dynamic bounds are applied only to such vertices that have only one active edge (such as source vertices). This ensures their weight constraints are always satisfied. For vertices with multiple active edges instead of simultaneous flow assignments which is computationally expensive (since it requires the solution of a nonlinear program, NLP, in the general case), we use univariate static bounds such as hardware limitations that are known a priori. A corresponding penalty function is then added to the fitness function to penalize the original multivariate weight constraints. This approach is computationally efficient and also ensures that no feasible solution is eliminated from the search space. As an example, consider the assignment of flows for the edges in the structural subchromosome DMCC2. Let the stochastically generated weight subchromosome W be as shown in Figure 11. Flows are assigned period-wise starting from the
min |) UB3,2 = min(f3max , |VParcel 1,1 − VParcel 1
= min(200, |17 − 0|) = 17 LB3,2 = f3min = 0 f3,2 = 0.30(17 − 0) + 0 = 5.1
Thus, the inventory balance of Parcel 1 is propagated to Period 2. Similarly, all other flows are assigned from the stochastic weights as shown in Figure 11. Thus, the state of the entire process network can be recreated from the structural and weight chromosomes. The size of these chromosomes grows slowly with increasing problem size since SAGA stores only the nonzero variables in the process network. Also, since all structural constraints and many weight constraints can be directly enforced in each chromosome without the need for penalty functions, the search performance becomes efficient. 3.3. SAGA Algorithm. The complete SAGA algorithm is shown in Figure 12. Given a process network, scheduling data, and a set of operating rules (constraints) indexed to the various vertices and edges of the network, the maximum size C of the subgraph is first estimated. During initialization, Np chromosomes are generated randomly. While generating each chromosome, the structural subchromosome is generated first and its feasibility vis-à-vis structural constraints is ensured. Then the corresponding weight subchromosome is generated. In the decoding step, the values in the weight subchromosome are mapped to flows of the corresponding edges in the process network and the state of the process network during each of the T periods recreated. These flows are used to calculate all the objective functions of the original optimization problem (model evaluation). Since all structural constraints are satisfied during schedule generation and univariate weight constraints are satisfied using dynamic bounds, only multivariate weight constraints need to be penalized. These penalty values, and thus the fitness function for each chromosome, are also evaluated. Chromosomes are then ranked based on the concept of nondomination10,11 and assigned a crowding distance value. They are then copied into a mating pool using tournament selection. Two chromosomes are picked randomly from the mating pool of Np parent chromosomes and undergo genetic operations to produce daughters. As per the elitism operation, the Np daughters are mixed with the Np parents and the 2Np chromosomes are re-ranked to identify the best Np chromosomes. These Np chromosomes are then used as the starting population for the next generation. In section 4 we use SAGA for optimizing crude oil operations in a refinery.
Figure 11. Decoding the weight subchromosome to establish the flows in the process network.
source vertices in each structural subgraph. Source vertices can have at most one active edge, so we use univariate, dynamic bounds that obviate penalty functions. In Period 1, e1 is the only active edge from the source nodes, i.e., e2, e3, and e4 are inactive and f1,2 = f 2,1 = f 2,2 = 0. The flow through e1 is only limited by Parcel 1’s initial inventory, VParcel 1,0, its minimum min desired inventory VParcel 1 , and the hardware limitation on the flow through the pipeline f 1max. Based on the stochastic value of 0.83 assigned in W, f1,1 can be calculated using eqs 8 and 9 as min |) UB1,1 = min(f1max , |VParcel 1,0 − VParcel 1
= min(200, |100 − 0|) = 100
LB1,1 = f1min = 0 f1,1 = 0.83(100 − 0) + 0 = 83
4. REFINERY CRUDE OIL SCHEDULING Petroleum refining involves separating crude oil into its constituents and converting them into high value products. In
Edge e6 in period 1 is assigned a flow f6,1 based on static bounds derived from maximum and minimum pipeline flow 5263
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
In this work, we demonstrate the application of SAGA to schedule crude oil operations in a marine access refinery― from unloading schedule for each VLCC/vessel to assignment of storage tanks to CDU―in order to maximize the gross profit. Profit is the difference between the crude margins and the logistics costs such as sea-waiting cost (or demurrage) of VLCC, crude-mix changeover losses, and safety-stock penalty. As shown in Figure 10, the general problem is defined as follows:37 Given 1. crude delivery data: estimated arrival times of ships, their crude parcels, and parcel sizes 2. maritime infrastructure description: jetties, jetty-to-tank and SBM-to-tank connections, crude unloading transfer rates, and SBM pipeline holdup volume and its resident crude 3. tank farm data: storage tanks, their capacities, their initial crude stocks and compositions, and crude quality specifications or limits 4. crude processing data: CDUs, processing rates, and crude quality specifications 5. economic data: demurrage rates, crude changeover costs, safety-stock penalties, crude margins, and product demands Determine 1. unloading schedule for each ship including the timings, rates, and destination tanks for all parcel transfers 2. inventory and crude concentration profiles of all storage tanks 3. charging schedule for each CDU including the feed tanks, feed rates, and timings Subject to the operating practices A. A parcel can unload to only one storage tank at any period, but may unload to multiple tanks over time. B. A storage tank cannot receive and feed crude at the same time. C. The prescribed sequence in which a VLCC unloads its parcels is known a priori. This is normally fixed when the VLCC loads its parcels and is prespecified by the refinery. D. Up to two parcels can be unloaded in any period. E. All parcels have to unload in the given time horizon. F. During operation, CDUs should not shut down. G. Only one VLCC can dock at the SBM station at any time. H. Multiple tanks can feed a CDU simultaneously and vice versa. I. After each crude receipt, a tank needs one period to settle and remove brine. Assuming 1. Crude flow is plug flow in the SBM. 2. Holdup of the SBM line is far smaller than a typical parcel size. Thus, only one type of crude resides in the SBM line at the end of each parcel transfer. 3. Holdup of the jetty pipeline is negligible. 4. Crude mixing is perfect in each storage tank. 5. Crude changeover times are negligible. In order to solve the refinery crude oil scheduling problem, many MILP and MINLP models have been formulated with either discrete or continuous time representation. The flow rate− composition bilinearity in tank composition poses a significant
Figure 12. Flowchart of SAGA.
general, refineries process several crudes. In a marine access refinery, crudes are supplied through marine transportation. Crudes arrive in either large multiparcel tankers or small singleparcel vessels. Very large crude carriers (VLCCs) have multiple compartments to carry several parcels of different crudes. Due to their large size, they have to be docked offshore at a singlebuoy-mooring (SBM) station. A refinery typically has one SBM and it is connected to the crude storage tanks in the refinery by a long pipeline. Crude can also be received in small ships that dock at onshore jetties. A refinery can have multiple such jetties where several ships can be unloaded simultaneously. Refinery crude oil operations involve unloading crudes from the ships into storage tanks and feeding the crude distillation units (CDUs) from these tanks. Fluctuations in crude oil prices and product demands are essential characteristics of the refinery supply chain. The manufacturing process is complex with low profit margins. As noted by Reddy et al.,34 optimal scheduling leads to increase in throughput, intelligent use of less expensive crude stocks, better quality control, reduction in sea-waiting cost of ships (demurrage), and improved control and predictability of downstream processing, thus saving millions of dollars per year.35,36 Therefore, advanced planning and scheduling of crude oil operations is essential. 5264
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
flow assignment. The remaining are multivariate in nature and necessitate penalty functions. Hard penalties are used for tank heel Vkmin and capacity Vkmax violations:
challenge in such formulations and has been handled through linear approximation,38,39 discretization of CDU feed rates,40 and iterative procedures.34,41,42 4.1. SAGA for Refinery Crude Oil Scheduling. Figure 13a shows a problem with four VLCC parcels, four tanks, and two CDUs. The scheduling horizon is of nine periods. This process network is shown in Figure 13b and comprises 10 vertices and 12 edges. Operating practices A−G lead to structural constraints. As described in section 2.3, structural constraints can be used to estimate an upper bound on the size of any feasible subgraph of the process network. In the current case, from operating practice D, at most two edges among e1−e8 can be active at any period. The edges e9−e12 can all be active simultaneously from operating practice H, so the upper bound on any feasible subgraph of Figure 13b is 6. The operating practices are also applied when generating structural subchromosomes. The weight constraints are limits on inventory capacities and component specifications at the vertices. The weight constraints on inventory capacities at the source vertices are univariate and accommodated using dynamic bounds during
⎧0 if Vkmin ≤ Vk , t ≤ Vkmax , ⎪ gta = ⎨ where k = 1, 2, ..., 8 ⎪ ⎩ MH otherwise
(10a)
The amount of crude Q processed over the entire scheduling T=9 12 horizon is given by Q = ∑t=1 ∑m=9 f m,t. A bracketed penalty is used to ensure that this does not exceed the total demand D. ⎧0
if
⎪ gtb = ⎨
Q≤D
⎛Q ⎞2 ⎪ MB⎜ − 1⎟ otherwise ⎠ ⎩ ⎝D
(10b)
Constraints on crude quality specifications are reflected using bracketed penalties for each vertex and each quality specification c.
⎧ ⎛ 2 xc , k , t ⎞ ⎪ ⎜ min min ⎪ MB⎜1 − min ⎟⎟ if xk > 0 and xc , k , t < xk , where k = 1, 2, ..., 8 xc , k ⎠ ⎪ ⎝ ⎪ c 2 gt = ⎨ ⎛ xc , k , t ⎞ ⎪ ⎜ max ⎟ ⎪ MB⎜1 − max ⎟ if xc , k , t > xk > 0, where k = 1, 2, ..., 8 xc , k ⎠ ⎪ ⎝ ⎪ ⎩0 otherwise
(10c)
S is the safety-stock limit and Vs,t is the inventory in tank s at period t. The fitness function corresponds to profit maximization,
where xc,k,t is the cth quality at the kth vertex during period t. The objective is to maximize profit given as T=9
profit =
∑
(m1f9, t + m2f10, t + m3f11, t + m4 f12, t )
t=1 (11)
Here, m1−m4 are the margins for the various crude parcels and f m,t is the flow through the mth edge at the tth period. The logistics cost L includes the sea-waiting cost (or demurrage) of VLCC, crude-mix changeover losses, and safety-stock penalty in each period. ⎡ 12 ⎤ q2⎢ ∑ R m , t ⎥ ⎢ ⎥ ⎦ i=1 t = 2 ⎣m = 9 4 T=9 ⎡ ⎤ + ∑ q3⎢S − ∑ (Vs , t )⎥ ⎢ ⎥ ⎦ t=1 ⎣ s=1 1
L = q1 ∑ (tiu − tia) +
T=9
∑
(12)
where q1, q2, and q3 are the cost factors associated with demurrage, changeover losses, and safety-stock penalty. tia and tiu are the VLCC arrival and actual unloading times. Rm,t represents a changeover associated with edge em from period t − 1 to period t. ⎧ 0 if state of edge e is the same during m ⎪ R m,t = ⎨ periods t − 1 and t ⎪ ⎩1 otherwise
Figure 13. (a) Refinery crude oil scheduling network of example 1 and (b) its process network. 5265
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
weights), and penalties (hard and bracketed). Conventionally, GA parameters are selected using trial-and-error or using values and guidelines in the literature.10−13 In SAGA, literature values were used for number of generations, population size, and genetic operator probabilities for weights; the probabilities for the structural mutation operator alone were selected using trialand-error. The penalties were set to very high values. First, a single objective optimization was solved to determine schedules that maximize profit. These results were compared with those in the literature with an MILP formulation37 (RRA algorithm). Table 2 compares the size of the MILP formulation37 with that of SAGA. A GA would have the same number of variables as the MILP formulation. For the SAGA formulation, the upper bound on the size of any feasible subgraph of the process network for each group of problems is reported. The size of the chromosome is twice the product of this upper bound and the number of periods in the scheduling problem, with the factor of 2 arising from the two equisized subchromosomes storing the structural and weight parts of the schedule. As is evident from Table 2, the SAGA formulation offers a much sparser representation leading to a reduction of 72−95% in the number of variables. The size of the MILP formulation grows with the product of several factors such as (number of parcels × tanks × periods) and (number of tanks × number of CDUs × periods). In contrast, the SAGA formulation size grows slowly since it is only linearly proportional to the scheduling horizon (number of periods) as only nonzero variables are explicitly stored in the chromosome. Compared to conventional GAs, SAGA also requires far fewer number of penalty functions with a reduction of 47−84%. For the biggest examples in this study (examples 22−24), the total number of constraints is 8223, all of which have to be handled using penalty functions in a conventional GA. With SAGA, this is reduced to 1323 since most of the constraints are accommodated a priori by the graph representation. Table 3 compares the computational performance and the profit of the schedule identified using SAGA and the MILP formulation reported by Li et al.37 For the latter, we have
Table 1. SAGA parameters no. of generations Np lstr probability of crossover Pcross probability of jumping-gene operation PJG probability of mutation Pmut for structural subchromosomes vertices at depth = 0 (source vertex) vertices at depth = 1 vertices at depth = 2 probability of mutation Pmut for weight subchromosomes penalty value used in hard constraints MH large value used in bracketed penalty MB
8000 100 10 0.9 0.5 0.3 0.05 0.03 1/lchrom 500 108
accounting for the logistic costs, and constraint violation penalties. T ⎛
fitness = profit − L −
⎞
∑ ⎜⎜gta + gtb + ∑ gtc⎟⎟
t=1 ⎝
c
⎠
(13)
The SAGA procedure described in section 3 is performed thereafter. 4.2. Results. SAGA has been implemented in Absoft Pro Fortran 9.0 and used for solving the 24 crude scheduling examples reported by Li et al.37 Of these, examples 1−6 are small with scheduling horizons of 7−21 periods, examples 7−21 are medium with 42 periods, and examples 22−24 are large with 60 periods. Complete details of all examples such as vessel arrival data, tank capacities, heels, initial inventories, crude concentration ranges in tanks and CDUs, transfer rates, processing limits, operation costs, crude margins, demands, and specifications on crude feed quality are given in the Supporting Information (see Tables S1−S5). Table 1 lists all the SAGA parameters used for solving the examples: number of generations, population size, binary string length, genetic operator probabilities (for graphs and
Table 2. Comparison of SAGA and Conventional Formulation Size
5266
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
Table 3. Comparison of SAGA and MILP Formulation37 Resultsa SAGAb
MILP (same time limit as SAGA)c
% difference between SAGA and MILP (no time limit)
MILP (no time limit)c
example
profit
CPU time
profit
CPU time
profit
CPU time
profit
CPU time
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 18 19 16 20 21 22 23 24
5081.16 0.9433 × 108 3384.64 3407.25 3386.94 3299.49 4404.06 4566.67 4532.62 4496.56 4526.07 4118.75 4306.82 4022.21 4064.84 4400.16 4430.59 4487.30 4707.76 4695.50 4719.92 7409.04 7329.49 7408.25
176 236 579 571 565 632 1404 1326 1415 1418 1387 1364 1278 1344 1330 1376 1356 1362 2027 2063 2138 3694 3654 3718
5069.94 1.0119 × 108 − 3443.46 3072.34 3320 4584.04 4588.99 4600.14 − − 4167.08 − 4123.18 − 4505.07 − 4591.05 4725.76 4740.11 4747.09 − − −
2 2 − 151 157 23 1022 446 317 − − 461 − 1264 − 426 − 226 1579 1393 359 − − −
5069.94 1.0119 × 108 − 3443.46 3072.34 3320 4584.04 4588.99 4600.14 4592.24 4601.37 4167.08 4431.29 4123.18 4553.76 4505.07 4440.25 4591.05 4725.76 4740.11 4747.09 − − 7425.25
2 2 − 151 157 23 1022 446 317 3090 5320 461 48860 1264 13957 426 18318 226 1579 1393 359 − − 94602
0.22 −6.78 − −1.05 10.24 −0.62 −3.93 −0.49 −1.47 −2.08 −1.64 −1.16 −2.81 −2.45 −10.74 −2.33 −0.22 −2.26 −0.38 −0.94 −0.57 − − −0.23
7285 10309 − 278 259 2684 37 196 346 −54 −74 195 −97 6 −90 223 −93 503 28 48 495 − − −96
a
Processor: Intel Core 2 Duo CPU, E8500 at 3.26 GHz, 3.17 GHz. RAM: 4 GB (3.25 GB usable). System type: Win 7, 32 bit OS. bAbsoft Pro Fortran 9.0. cGAMS version: 23.7.3 WIN 27723.27726 VS8 x86/MS.
Table 4. Schedule for Example 1 crude amount discharged [to CDU No.] and charged (from vessel no.) in kbbl in a respective tank in a given period tank period 1 2 3 4 5 6 7 8 9
Figure 14. Convergence plot for example 1.
calculated results in two ways: (1) one where the CPU time for the solver was limited to the same amount as for SAGA and (2) where the allowed CPU time was effectively unlimited (106 s). It should be noted that any direct comparison of the computational performance between a stochastic optimization approach such as SAGA and a deterministic optimization method is misleading given their differing characteristics. For instance, the time-limited MILP did not find any feasible solution for eight of the 24 problems. Therefore, rather than the absolute computational time, it is more revealing to evaluate the broad trends with increasing problem size. As expected, the computational time for SAGA increases linearly with problem size (R2 = 0.98) with little variation between problems of a given size. In contrast, there is a significant variation in the per-
T1 −50.00 [1] −67.44 [1] −56.25 [1] −51.71 [1]
T2
T3
T4
−50.00 [2] −50.00 [2]
+10.00 (1), +300.00 (2)
+300.00 (3)
−50.00 [2] +340.00 (4)
−57.28 [2] −91.69 −53.17 −50.19 −54.88 −93.74
[2] [2] [2] [2] [2]
−51.17 −50.39 −50.97 −76.58 −87.43
[1] [1] [1] [1] [1]
formance of the MILP formulation; for example, among the 12 medium-sized problems, the time requirements range from 226 to 48 860 s. The corresponding range for SAGA is 1278−1418 s with an average of 1363 s and a standard deviation of 41 s. Despite its stochastic nature, with no gradient-based search, the quality of the result obtained from SAGA is surprisingly close to the optimal solution obtained from the MILP formulation. In two cases, the profit of the schedules identified by SAGA is larger than that identified by the MILP formulation, possibly because of the optimality gap. In the remaining cases, SAGA results are on average 1.5% less with a maximum of 11% in one case. This shortfall is due to the well-known problem of slower convergence10,11 of 5267
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
Table 5. Schedule for Example 18 crude amount discharged [to CDU No.] and charged (from vessel no.) in kbbl from a tank in given period tank period
T1
T2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
−20.02 [3] −20.00 [3] −26.92 [3]
−20.00 −20.00 −20.00 −20.00
T3
[1] [1] [1] [1]
T4
+149.40 (4)
−20.04 [3] −22.56 [3] −20.00 [3]
+200.00 (6)
+240.00 (8)
−20.15 −20.00 −22.81 −20.27 −20.07 −20.37 −20.03 −20.00 −25.80 −23.06 −20.48 −20.31 −20.00 −20.00 −22.38
[3] [3] [3] [3] [3] [3] [3] [3] [3] [3] [3] [3] [3] [3] [3]
−20.00 −20.15 −20.00 −20.00 −20.46 −20.00 −20.00 −20.00 −20.00 −20.00 −20.00 −20.23 −20.00 −20.50 −36.33 −20.00 −20.00
[2] [2] [2] [2] [2] [2] [2] [2] [2] [1], [1], [1], [1], [1], [1] [1] [1]
−20.00 −20.00 −20.00 −20.00 −20.00
[2] [2] [2] [2] [2]
−21.25 −22.48 −20.15 −20.21 −20.00 −20.00 −20.00 −20.00 −20.00 −20.00 −23.02 −20.15 −20.00 −20.00 −20.00
[1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1] [1]
T5
−20.00 [1], −20.00 −20.00 [1], −20.62 −20.00 [1], −36.77 −20.00 [1], −20.00 −20.00 [1], −20.00 −29.38 [1], −30.81 −30.01 [1], −20.00 −20.00 [1], −30.01 −25.00 [1], −22.89 −20.00 [1], −20.00 −20.00 [1], −30.01 −25.00 [1], −20.07 −20.00 [1], −20.00 −20.07 [2] −20.00 [2] +10.00 (5) +250.00 (7)
[2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2]
T6
T7
+350.00 (2) +150.60 (4)
−31.57 −20.00 −22.50 −20.00 −20.93 −20.04
−20.00 −20.00 −20.11 −20.00
[2] [2] [2] [2] [2] [2] [1], [1], [1], [1], [1], [1], [1],
[3] [3] [3] [3] [3] [3]
−20.84 [3] −20.00 [3] −22.73 [3] −20.66 [3] −27.50 [3] −20.00 [3] −20.00 [3] −20.31 [3] −20.00 [3] −20.00 [3] −27.89 [3] −20.00 [3] −20.31 [3] −20.33 [3] −20.00 [3] +10.00 (9)
[2] [2] [2] [2]
+250.00 (10) −20.00 −20.00 −20.00 −20.21 −20.00 −25.00 −27.78 −20.00 −20.66 −20.62 −25.00 −20.00 −22.91
T8 +10.00 (1) +200.00 (3)
+250.00 (11) +190.00 (12)
−29.81 −20.00 −20.06 −21.91 −20.00 −20.00 −31.82
[2] [2] [2] [2] [2] [2] [2]
example 1 are shown in Figure 15. The feed rates to the CDUs change significantly at several points. Such drastic changes in feed flow rate may disrupt the hydraulic operation of the CDU, lead to off-specification distillation cuts, and destabilize the column. A secondary objective to minimize the deviations in CDU feed rates was therefore included in SAGA. The fitness function for the second objective is
evolutionary algorithms as they approach the optima. The average profit in each generation is shown in Figure 14. As can be seen, there is very little improvement in the solution quality after the first 100 generations. To study robustness, 20 different instances of example 1 were solved using SAGA starting from different random seeds. In all cases, the maximum deviation in the results was less than 0.1%, indicating that the solution approach is robust even though it relies on stochastic search. Like any population-based search approach, SAGA can identify multiple solutions for a problem if they exist. Two schedules with nearly equal profit were identified in several cases. The schedules for examples 1, 18, and 22 are shown in Tables 4−6. The loading−unloading pattern of crudes in these schedules shows that when any tank starts feeding a CDU, it continues on unless some constraint such as inventory capacities is violated. This is to be expected since this would reduce the logistic cost of changeover. The crude inventory and CDU feed profiles for
⎛ ⎞⎞ T ⎛ fitness 2 = −⎜σ2 + ∑ ⎜⎜gta + gtb + ∑ gtc⎟⎟⎟ ⎜ ⎟ ⎠⎠ ⎝ t=1 ⎝ c
(14)
T=9
σ2 =
∑
{[(f9, t + f10, t ) − (f9, t − 1 + f10, t − 1 )]2
t=2
+ [(f11, t + f12, t ) − (f11, t − 1 + f12, t − 1 )]2 } 5268
(15)
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
Table 6. Schedule for Example 22 crude amount discharged [to CDU No.] and charged (from vessel no.) in kbbl in a respective tank in a given period tank period
T1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
−20.00 [3] −38.76 [3] −21.52 [3] −29.38 [3] +113.73 (5) +180.00 (7) −10.15 −24.69 −22.97 −23.16 −40.29 −20.00 −20.00 −36.26
[3] [3] [3] [3] [3] [3] [3] [3]
−32.82 [3] −37.04 [3]
−25.00 −30.00 −20.00 −30.00
[3] [3] [3] [3]
T2
−20.00 −30.00 −31.10 −20.00 −32.55 −22.93 −43.81 −25.94 −22.07 −20.00
[3] [3] [3] [3] [3] [3] [3] [3]
T4
[1] [1] [1] [1] [1] [1] [1] [1] [1] [1] −20.86 [1] −20.00 [1] −20.00 [1] −20.00 [1] −31.26 [1], −20.03 −20.97 [1], −38.06 −21.25 [1], −26.33 −32.51 [1], −21.32 −20.00 [2] +94.99 (13)
+270.00 (16)
T5
−30.94 −20.00 −30.00 −23.75 −40.25 −31.26 −21.32 −25.00 −20.00 −43.30 −20.00 −20.00 −20.00
[1] [1], [1], [1], [1], [1],
−55.03 −30.00 −20.07 −20.00 −25.66
−41.27 [1] −26.02 [1] −21.72 [1], −22.50 [2] −21.25 [2] +230.00 (9)
[2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2] [2]
[2] [2] +210.00 (15) [2] [2] +170.65 (12)
−39.08 −21.87 −20.00 −20.00 −20.00 −20.00
−20.31 [1] −20.00 [1] −30.08 [1]
−40.01 [2] −21.25 [1], −25.63 [2] −20.00 [1], −20.00 [2] +186.66 (26) −41.93 [1], −20.62 [2] +170.00 (23) −25.31 [1], −20.62 [2] +180.00 (24) −26.31 [1], −23.08 [2] −47.05 [1], −24.30 [2] −10.00 [3] −45.02 [1] −31.30 [2] −21.52 [3] −51.43 [1] −20.00 [2] −47.09 [3] −20.23 [1] −20.00 [3] −10.78 [1] −10.00 [1] −27.54 [3] −30.04 [1] −20.00 [1] −36.18 [3] −38.61 [3] −10.50 [1] −11.64 [1] −20.00 [3] −20.00 [1] −20.31 [3] −40.17 [1]
[1] [1], [1], [1], [1], [1], [1], [1], [1]
T7
T8
−20.00 −36.89 −40.01 −20.00 −40.01 −27.11 −20.00
[1] [1], [1], [1], [1], [2]
+190.00 (4) +36.27 (5) +150.00 (8)
+10.00 (10) +150.00 (14)
+69.35 (12)
[2] [2] [2] [2] [2]
−20.00 −55.03 −20.23 −20.00 −20.00 −20.00 −24.37 −20.00 −20.00
T6
+10.00 (1)
+95.01 (13)
−43.61 −20.00 −20.00 −28.75 +150.00 (20) −20.00 −21.91
−20.00 −34.42 −20.00 −23.24 −24.77 −20.00 −20.00 −20.00
T3
−20.00 [1], −40.02 [2] −22.58 [2] +250.00 (3) −22.65 [2] +220.00 (6)
+200.00 (2)
−21.56 −20.00 −20.00 −20.00
−20.00 −20.00 −30.67 −40.02
−20.00 [3]
+10.00 (21)
−23.75 [3] −20.00 [3] −20.00 [3] −32.00 [3] +300.00 (25)
+63.34 (26) −37.51 −45.18 −35.64 −10.03 −50.02 −20.00 −25.20 −40.72 −40.17 −30.00 −21.48 5269
−41.54 [3] −20.00 [3] −26.60 [3] −46.27 [3] +300.00 (11) +10.00 (14)
[2] [2] [2] [2]
+180.00 (19)
[2] [2] [2] [2] [2] [2] [2]
[3] [3] [3] [3]
−20.62 [3] −20.00 [3] −33.17 [3]
+200.00 (17) [2] [2] [2] [2]
−20.00 −23.12 −20.03 −10.00
−20.00 [3] −36.26 [3]
−20.00 [3] −30.00 [3]
−20.00 [3] −20.00 [3] −23.12 [3] +250.00 (18)
+200.00 (22)
[3] [3] [3] [3]
[2] [2] [2] [2] [2] [2] [2] dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
Figure 15. Crude inventory and CDU feed rate profiles for schedule A in example 1.
Figure 16. Pareto optimal fronts for (a) example 1, (b) example 18, and (c) example 22.
where ( f 9,t + f10,t) and ( f11,t + f12,t) are the total crude inflow rates for CDU-1 and CDU-2 at any period t, respectively. This objective seeks to minimize the sum of square deviation of CDU flow rates between adjacent periods while penalizing infeasibilities as in eq 13. The results for examples 1, 18, and 22 are shown in the form of a Pareto plot in Figure 16. The schedule in Table 3 identified by the single objective optimization (with a profit of $5081.16) is shown as point A in Figure 16a. The high degree of interperiod changes in this schedule is reflected by its large σ2. The Pareto plot also reveals many other schedules with marginally lower profit but with significantly
lower extent of flow rate fluctuations. One such solution with a profit of $5064.25 is marked as B in Figure 16a. Figure 17 shows the corresponding CDU feed flow rate profiles. The Pareto fronts for examples 18 and 22 in Figure 16 show a more gradual variation between profit and σ2 with no sudden slope changes. A profit to σ2 trade-off assessment is therefore required to identify a robust operating point in these cases. These examples bring out the need for multiobjective optimization (MOO) studies. The computational time required by SAGA for solving the MOO extensions was nearly the same as in the single objective case. 5270
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
Article
advances in multi-threaded and parallel programming have the potential to benefit the computational performance of SAGA. This will also be investigated in the future.
■
ASSOCIATED CONTENT
* Supporting Information S
Complete details of all examples such as vessel arrival data, tanks capacities, heels, and initial inventories, crude concentration ranges in tanks and CDUs, transfer rates, processing limits, operation costs, crude margins, demands, and specifications on crude feed quality. This material is available free of charge via the Internet at http://pubs.acs.org.
■
AUTHOR INFORMATION
Corresponding Author
Figure 17. CDU feed flow rate profiles for schedule B in example 1.
*E-mail:
[email protected]. Tel.: +65 6516-8041. Fax: +65 6779-1936.
5. DISCUSSION AND CONCLUSIONS Evolutionary algorithms such as GA have been widely studied for many engineering problems where their ability to easily handle nonlinearities, to generate multiple solutions, and to cope with multiple conflicting objectives are important. In contrast, math programming based approaches offer a systematic way to handle constraints and a better speed of convergence and they can guarantee optimality of solution for linear problems. Therefore, stochastic algorithms have traditionally received limited attention for large-scale combinatorial problems such as scheduling involving a large number (thousands) of variables and constraints. In many process operations problems, most of the variables and constraints are required to reflect the process network. Here, we propose a graph-based scheme that exploits the underlying structure of the process network and thus offers a much smaller representation of the problem. This sparse representation then enables more efficient optimization in general. Here, we have studied the benefits of the structure-based representation in the context of genetic algorithms. Specifically, we have developed a new GA called structure adapted GA (SAGA) where the chromosome explicitly uses the graph-based representation. The sparse representation offered by SAGA translates to two distinct benefits: (1) the size of the GA chromosome is significantly reduced and (2) the number of penalty functions required to capture the problem constraints is also much smaller. These two benefits combine to yield excellent search performance in terms of both solution quality and computational time. Of the 24 problems studied, in most cases the results were only 1.5% from the optima identified by GAMS; the computational time required by SAGA for such problems with over 10 000 variables was 21−62 min, which contrasts with the reported time in days16 with conventional GAs applied to other similar sized problems. It should be noted that, like any stochastic algorithm, SAGA by its inherent nature cannot guarantee the global optimal solution. However, it can identify good (near-optimal) solutions consistently, within a short time; it is also more robust to increasing problem size. The benefits from exploiting the graph representation would accrue to other process operations problems as well as with other stochastic optimization techniques such as simulated annealing and particle swarm optimization. In future, we plan to incorporate deterministic algorithms (such as simplex) as additional operators in SAGA to enhance the efficiency of local search. Also, new computer architectures and the resulting
Present Address §
Discipline of Polymer Science and Technology, DPT, IIT Roorkee, Saharanpur, India 247001. Notes
The authors declare no competing financial interest.
■
NOMENCLATURE am,t = weight value corresponding to mth edge in tth period in a weight subchromosome C = maximum size of the graph cm = capacity of mth edge dvk = depth of kth vertex D = total crude demand of processed E = set of edges {e1, e2, ..., em, ...} f m,t = flow through mth edge in period t G = digraph gt = penalty for constraint violation during period t Is,p,t = inflow variable corresponding to sth silo, pth parcel, and period t lchrom = length of the GA chromosome Lt = logistic cost during tth period LBm,t = lower bound of flow through mth edge during period t lstr = binaries representing one variable in binary-coded GA MH = penalty value for hard constraints MB = large value used in bracketed penalty m = margin maxgen = maximum generations used in GA Np = population in GA Os,t = outflow variable from sth silo during period t qi = cost factor Q = amount of processed crude S = safety-stock limit T = number of periods (t ∈ [1, T]) in scheduling horizon UBm,t = upper bound of flow through mth edge during period t V = set of vertices {v1, v2, ..., vk, ...} Vk,t = inventory of kth vertex during period t xc,k,t = quality specification c, at vertex k, during period t
Indices
c = quality specification i, j, k = vertex m = edge p = parcel s = tank t = period 5271
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272
Industrial & Engineering Chemistry Research
■
Article
(25) He, Y.; Hui, C.-W. A Binary Coding Genetic Algorithm for Multi-Purpose Process Scheduling: A Case Study. Chem. Eng. Sci. 2010, 65, 4816−4828. (26) Wang, K.; Löhl, T.; Stobbe, M; Engell, S. A Genetic Algorithm for Online-Scheduling of a Multiproduct Polymer Batch Plant. Comput. Chem. Eng. 2000, 24, 393−400. (27) Forrest, S. Genetic Algorithms: Principle of Natural Selection Applied to Computation. Science 1993, 261, 872−878. (28) Deo, N. Graph Theory with Applications to Engineering and Computer Science; Prentice Hall of India: New Delhi, India, 2000. (29) Mah, R. S. H. Application of Graph Theory to Process Design and Analysis. Comput. Chem. Eng. 1983, 7, 239−257. (30) Shivakumar, K.; Narasimhan, S. A Robust and Efficient NLP Formulation using Graph Theoretic Principles for Synthesis of Heat Exchanger Networks. Comput. Chem. Eng. 2002, 26, 1517−1532. (31) Sanmarti, E.; Puigjaner, L.; Holczinger, T.; Friedler, F. Combinatorial Framework for Effective Scheduling of Multipurpose Batch Plants. AIChE J. 2002, 48, 2557−2570. (32) Kim, Y.; Fan, L. T.; Yun, C.; Park, S. B.; Park, S.; Bertok, B.; Friedler, F. Graph Theoretic Approach to Optimal Synthesis of Supply Networks: Distribution of Gasoline from a Refinery. Comput.-Aided Chem. Eng. 2008, 25, 247−252. (33) Mah, R. S. H. Chemical Process Structures and Information Flows; Butterworths: Boston, 1990. (34) Reddy, P. C. P.; Karimi, I. A.; Srinivasan, R. A Novel Solution Approach for Optimizing Crude Oil Operation. AIChE J. 2004, 50, 1177−1197. (35) Kelly, J. D.; Mann, J. L. Crude Oil Blend Scheduling Optimization: An Application with Multi-Million Dollar Benefits― Part 1. Hydrocarbon Process. 2003, 82, 47−51. (36) Kelly, J. D.; Mann, J. L. Crude Oil Blend Scheduling Optimization: An Application with Multi-Million Dollar Benefits― Part 2. Hydrocarbon Process. 2003, 82, 72−79. (37) Li, J.; Li, W.; Karimi, I. A.; Srinivasan, R. Improving the Robustness and Efficiency of Crude Scheduling Algorithms. AIChE J. 2007, 53, 2659−2680. (38) Lee, H.; Pinto, J. M.; Grossmann, I. E.; Park, S. Mixed-Integer Linear Programming Model for Refinery Short-Term Scheduling of Crude Oil Unloading with Inventory Management. Ind. Eng. Chem. Res. 1996, 35, 1630−1641. (39) Jia, Z.; Ierapetritou, M. G.; Kelly, J. D. Refinery Short-Term Scheduling using Continuous Time Formulation: Crude Oil Operation. Ind. Eng. Chem. Res. 2003, 42, 3085−3097. (40) Moro, L. F. L.; Pinto, J. M. Mixed-Integer Programming Approach for Short-Term Crude Oil Scheduling. Ind. Eng. Chem. Res. 2004, 43, 85−94. (41) Li, W. K.; Hui, C. W.; Hua, B.; Tong, Z. Scheduling Crude Oil Unloading, Storage, and Processing. Ind. Eng. Chem. Res. 2002, 41, 6723−6734. (42) Reddy, P. C. P.; Karimi, I. A.; Srinivasan, R. A New Continuous Time Formulation for Scheduling Crude Oil Operation. Chem. Eng. Sci. 2004, 59, 1325−1341.
REFERENCES
(1) Kondili, E.; Pantelides, C. C.; Sargent, R. W. H. A General Algorithm for Short-Term Scheduling of Batch Operations―I. MILP Formulation. Comput. Chem. Eng. 1993, 17, 211−227. (2) Ierapetritou, M. G.; Floudas, C. A. Effective Continuous-Time Formulation for Short-Term Scheduling. 1. Multipurpose Batch Processes. Ind. Eng. Chem. Res. 1998, 37, 4341−4359. (3) Mockus, L.; Reklaitis, G. V. Continuous Time Representation Approach to Batch and Continuous Process Scheduling. 1. MINLP formulation. Ind. Eng. Chem. Res. 1999, 38, 197−203. (4) Lee, K.-H.; Park, H. I.; Lee, I.-B. A Novel Nonuniform Discrete Time Formulation for Short-Term Scheduling of Batch and Continuous Processes. Ind. Eng. Chem. Res. 2001, 40, 4902−4911. (5) Giannelos, N. F.; Georgiadis, M. C. A Simple New ContinuousTime Formulation for Short-Term Scheduling of Multipurpose Batch Processes. Ind. Eng. Chem. Res. 2002, 41, 2178−2184. (6) Maravelias, C. T.; Grossmann, I. E. New General ContinuousTime State-Task Network Formulation for Short-Term Scheduling of Multipurpose Batch Plants. Ind. Eng. Chem. Res. 2003, 42, 3056−3074. (7) Mendez, C. A.; Cerda, J.; Grossmann, I. E.; Harjunkoski, I.; Fahl, M. State-of-the- art Review of Optimization Methods for Short-Term Scheduling of Batch Processes. Comput. Chem. Eng. 2006, 30, 913−946. (8) Floudas, C. A.; Gounaris, C. E. A Review of Recent Advances in Global Optimization. J. Global Optim. 2009, 45, 3−38. (9) Holland, J. H. Adaptation in Natural and Artificial Systems; University of Michigan Press: Ann Arbor, MI, 1975. (10) Deb, K. Multi-objective Optimization Using Evolutionary Algorithms; Wiley: Chichester, U.K., 2001. (11) Coello Coello, C. A.; Lamont, G. B.; Veldhuizen, D. A. V. Evolutionary Algorithms for Solving Multi-objective Problems, 2nd ed.; Springer: New York, 2007. (12) Kasat, R. B.; Gupta, S. K. Multiobjective Optimization of Industrial Fluidized-bed Catalytic Cracking Unit (FCCU) using Genetic Algorithm with the Jumping Genes Operator. Comput. Chem. Eng. 2003, 27, 1785− 1800. (13) Multi-Objective Optimization Techniques and Applications in Chemical Engineering; Rangaiah, G. P., Ed.; Advances in Process Systems Engineering 1; World Scientific: Singapore, 2008. (14) He, Y.; Hui, C.-W. A rule-based genetic algorithm for the scheduling of single-stage multi-product batch plants with parallel units. Comput. Chem. Eng. 2008, 32, 3067−3083. (15) He, Y.; Hui, C.-W. Genetic algorithm for large-size multi-stage batch plant scheduling. Chem. Eng. Sci. 2007, 62, 1504−1523. (16) Jung, J. H.; Lee, C. H.; Lee, I.-B. A genetic algorithm for scheduling of multi-product batch processes. Comput. Chem. Eng. 1998, 22, 1725−1730. (17) Dave, D.; Zhang, N. Hybrid Method using Genetic Algorithm Approach for Crude Distillation Unit Scheduling. Process Syst. Eng. 2003, 445−450. (18) Kang, M.-G.; Kang, S.; Park, S. Integrated Scheduling for Polyvinyl Chloride Processes. Ind. Eng. Chem. Res. 2006, 45, 5729− 5737. (19) Naraharisetti, P. K.; Karimi, I. A.; Srinivasan, R. Supply Chain Redesign―Multimodal Optimization using a Hybrid Evolutionary Algorithm. Ind. Eng. Chem. Res. 2009, 48, 11094−11107. (20) Rezaei, E.; Shafiei, S. Heat Exchanger Networks Retrofit by Coupling Genetic Algorithm with NLP and ILP Methods. Comput. Chem. Eng. 2009, 33, 1451−1459. (21) Liu, B.; Wang, L.; Liu, Y.; Qian, B.; Jin, Y.-H. An Effective Hybrid Particle Swarm Optimization for Batch Scheduling of Polypropylene Processes. Comput. Chem. Eng. 2010, 34, 518−528. (22) Perez-Vazquez, M. E.; Gento-Municio, A. M.; Lourenco, H. R. Solving a Concrete Sleepers Production Scheduling by Genetic Algorithms. Eur. J. Oper. Res. 2007, 179, 605−620. (23) He, Y.; Hui, C.-W. Genetic Algorithm for Large-Size Multi-Stage Batch Plant Scheduling. Chem. Eng. Sci. 2007, 62, 1504−1523. (24) He, Y.; Hui, C.-W. Genetic Algorithm Based on Heuristic Rules for High-Constrained Large-Size Single-Stage Multi-Product Scheduling with Parallel Units. Chem. Eng. Process. 2007, 46, 1175−1191. 5272
dx.doi.org/10.1021/ie201283z | Ind. Eng. Chem. Res. 2012, 51, 5256−5272