Ind. Eng. Chem. Res. 2010, 49, 10459–10470
10459
Rule-Based Generation of Thermochemical Routes to Biomass Conversion Srinivas Rangarajan, Aditya Bhan,* and Prodromos Daoutidis* Department of Chemical Engineering and Materials Science, UniVersity of Minnesota, Minneapolis, 421 Washington AVenue SE, Minneapolis, Minnesota 55455
Biomass conversion to fuels and chemicals involves a multitude of oxygen-containing compounds and thermochemical reaction routes. A detailed elucidation of the process chemistry is, thus, a key step in understanding the reaction mechanisms and designing chemical processes in a biorefinery. In this paper, a computational tool, called Rule Input Network Generator (RING), is presented as a platform for modeling diverse homogeneous and heterogeneous chemistries in biomass conversion and automatically generating the underlying complex reaction networks. RING accepts a set of reaction rules and initial reactants as inputs and exhaustively generates the reactions of the system. The reaction center of an elementary step is represented by a SMARTS-like string and identified as a submolecular pattern in a reactant molecular graph using a pattern-matching algorithm. The reaction events are subsequently modeled as a graph transformation system. The generality of this framework was substantiated by the successful application of RING in reproducing the reaction mechanisms of different biomass conversion systems, such as acid-catalyzed dehydration of fructose, base-catalyzed esterification of triglycerides, and gas phase pyrolysis of fatty esters. 1. Introduction Biomass, an abundant and renewable resource of organic carbon, will play a key role in supplying the world with “green” transportation fuels and other useful organic chemicals.1-4 A biorefinery, envisaged to function akin to a petroleum refinery, will draw a feedstock from an abundant biomass source and convert it into a host of smaller and more valuable products, such as ethanol, biodiesel, and levulinic acid, in a sequence of unit operations and processes.2,3,5,6 A mainstay in the design and development of petroleum refineries are the strategies, software tools, and semiempirical physicochemical property correlations that have been developed for process modeling, design, and optimization.7-10 These are extensively used in the basic design of petroleum refineries and in subsequent process improvements. Some tools, such as plant-wide simulators and optimizers, are in principle applicable in designing biorefineries, as well. Kinetic and mechanistic models of reaction systems in petroleum refining, on the other hand, are not directly transferable to biorefineries because the process chemistry and reaction mechanisms are significantly different. Such models, therefore, will have to be developed anew for modeling the chemistry relevant in biomass conversion. Several challenges need to be addressed prior to performing kinetic or mechanistic analysis of biomass conversion systems. First, biomass has a C/O ratio of 1:1, whereas fossil fuels consist predominantly of carbon and hydrogen, implying that the chemical transformations pertaining to oxygenates, which are not well explored yet, become important. Second, biomass conversion to fuels and chemicals can involve diverse thermochemical routes,11-16 such as gas phase pyrolysis, liquid phase solution chemistry (e.g., dehydration and hydrolysis), and heterogeneous catalytic chemistry. Furthermore, the composition of biomass varies, depending upon the source (e.g., cellulose, hemicellulose, and lignin) and, hence, leads to a variable product distribution. Thus, to understand the reaction systems of a biorefinery, it is essential to have a tool for (a) representing generic organic compounds in a compact format, (b) describing * To whom correspondence should be addressed. E-mails: (A.B.)
[email protected]; (P.D.)
[email protected].
and generating diverse chemistries in biomass conversion, and (c) identifying potential reaction routes that exist between reactants and products. Such a tool can then be used as a precursor for kinetic modeling and for pathway and mechanistic analysis because these require a reaction network as the first step. In this paper, we present Rule Input Network Generator (RING), a computational tool that provides the user with the ability to describe a variety of heterogeneous and homogeneous chemistries in biomass conversion and to generate complex reaction networks. The tool takes in as input reactants and reaction rules of a reaction system and generates a list of all possible products and reactions on the basis of the reaction rules. RING builds on the strategies developed for modeling pyrolysis,17 combustion,18 hydrocarbon processing,19,20 and synthetic organic chemistry;21 it further implements algorithms from the domain of Cheminformatics22 to generalize the description of a reaction using a graph-transformation system, thereby streamlining the method of reaction generation. Four examples of reaction network generation, consisting of different types of chemistries relevant in biomass conversion, are presented to highlight the versatility of RING. 2. Background Computational tools with a systematic procedure for representing chemical transformations were first developed in the field of chemistry for computer-aided organic synthesis.21 LHASA23 uses heuristics-based strategies for retrosynthesis,24 and IGOR25-27 implements the Dugundji-Ugi algebraic model of BE and R matrices25 to represent molecules and reactions and combinatorially generates all possible synthetic routes for organic chemical systems. These tools were developed for the specific purpose of planning the synthesis of organic compounds to aid chemists in designing experiments and, therefore, are not directly applicable in reaction engineering, in which the emphasis is on building physicochemical models of reaction systems. Most reactors in a petroleum refinery are complex: the number of compounds and intermediates involved is large enough to preclude complete identification using analytical techniques, and
10.1021/ie100546t 2010 American Chemical Society Published on Web 06/03/2010
10460
Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010
the reaction network is so highly connected that the mathematical analysis of these systems becomes nontrivial. One of the first generic strategies to build practical models of such complex reaction systems was Structure Oriented Lumping (SOL), proposed initially by Mobil8,28 and expanded later by ExxonMobil.29 SOL represents a molecule as a vector of the frequency (the number of occurrences) of structural increments in that compound. These increments were identified on the basis of the analysis of petroleum feedstocks and products. A reaction modifies the reactant molecule vector and results in the product. SOL was designed to provide a pathway-level approach, and the molecular representation scheme prevents the depiction of intermediates and, consequently, elementary steps. To provide further elucidation of reaction mechanisms, reaction generation tools have been developed that utilize a set of elementary steps as formal reaction rules to construct complex networks; hence, the term “rule-based” to characterize such tools. NETGEN17 is a rule-based tool that adopts the method of BE and R matrices for modeling pyrolysis, polymerization, biochemical reactions, and silicon nanoparticle growth.30-32 It also includes an ordinary differential equations solver for kinetic modeling. Reaction Description Language (RDL)19,33 and RDL++20 are rule-based tools with an English-like language as the user-interface. In RDL, reaction rules of chemical reactions are described using the syntax of the language. RDL++ expands RDL with functionality and syntax to describe more complex rules, such as determining aromatic and allylic atoms, recognizing penta-coordinated carbonium ion intermediates, and representing multiple catalysts and catalytic centers. In many cases, elementary reaction steps have some restrictions on the nature of the reactant species. For example, carbonium ions are formed on solid acid catalysts only upon adsorption of paraffins because olefins, despite having sp3 carbons, rapidly equilibrate with their carbenium ion surface intermediates. In RDL and RDL++, therefore, constraints on the size and structure of the reactants and the products can be imposed in a reaction rule whenever required, on the basis of experimental data or expert knowledge. The number of reactions generated in rule-based models increases almost exponentially with the addition of every reactant and reaction rule and can, therefore, increase the execution time significantly. For example, Hsu et al.20 showed that, given a set of reaction rules for propane aromatization, increasing the allowed size of the products from C5 to C9 resulted in increasing the execution time from as little as 15 s to almost 48 h. This combinatorial explosion in the number of species and reactions can be mitigated in RDL++ by preventing the generation of reactions that are unlikely on the basis of structure and reactivity arguments. The tool NETGEN, on the other hand, prunes the network on the basis of the number of reaction steps between a given product and the initial reactants (rank-based technique34) and the relative rate of the given step with respect to a reference reaction rate (rate-based technique35). Automated reaction generation has also been used substantially in the field of combustion of hydrocarbons because detailed kinetic modeling aids the design of engines and other combustion systems. EXGAS,36 an automated tool for generation of kinetic models, was developed to model gas phase oxidation and combustion of alkanes. It adopts the method of Chinnick et al.37 to represent a molecule as a tree initiating from a root atom. The external representation of molecules in EXGAS is in the form of a 1-D string. The tools described above were designed and applied for specific chemistries: RDL and RDL++ for catalytic systems,
Figure 1. Overall structure of the automated reaction network generator.
COMGEN18 and EXGAS for combustion, and NETGEN predominantly for free radical chemistry. These tools, as a result, are generally not applicable across different chemistries. In biomass conversion, however, the potential of different thermochemical routes is still being explored, and many types of chemistries seem promising, as discussed earlier. A single platform for modeling these diverse chemistries will provide a common medium to analyze the reaction pathways of different chemistries and to compare and contrast one type of process chemistry with another. In this context, a computational tool that allows for the representation of the different atomic configurations and bonding types, nonbonding interactions, different reagents (e.g., electrophiles and nucleophiles), ligands, catalysts, and multiple catalytic sites—chemical concepts pertaining to organic chemistry—will be highly valuable. Such a tool, in conjunction with information of thermodynamic quantities and kinetic parameters, can be used to construct and analyze possible mechanisms38and to develop microkinetic models.39 RING includes a generic three-step framework for describing reaction rules that allows the user to model all types of chemistries applicable in biomass conversion. Specifically, the tool uses graph theory and graph-transformation systems to provide abstractions for representing molecules and reactions, respectively. Section 3 describes the basic working principles of RING through examples, and section 4 demonstrates the generative ability of the tool by considering four different chemical systems from the domain of biomass conversion. A detailed description of the underlying theory and algorithms will be the subject of a subsequent paper. 3. RING: a Description RING, as discussed in sections 1 and 2, is a rule-based automated reaction network generator. The set of initial reactants and a set of elementary steps as reaction rules are the essential inputs to RING, which then generates a list of possible reactions. Any rule-based tool requires a scheme for representing molecules and reaction rules and a procedure for generating the reactions. RING, for this reason, is composed of three components (Figure 1): a molecular representation system that is used to depict all molecules (including reactants, products, and intermediates), a scheme for describing reaction rules unambiguously, and a network generator that manages the process of construction of reaction networks based on the reaction rules. Both the network generator and the depiction scheme for reaction rules are aided by other subcomponents; the reaction rule description contains a pattern representation scheme for describing specific fragments in a molecule; and the network generator contains an internal molecular representation scheme
Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010
10461
Figure 3. Adsorption of a ketone on an acid site to form a carbocation. Table 1. Sample SMARTS-Like Patterns and Their Interpretation pattern CC CO C∼C C1dO2 Figure 2. Sample external string representation adapted from SMILES40 used in RING. ‘1’ is a ring identifier. For benzene, this implies that the first and the last carbon atoms given in the string are connected by a bond thus forming a ring. Similarly, the oxygen and the first carbon in Furan have a bond, thus forming a ring. Note also that all aromatic atoms are represented in small letters, and others are represented in capital letters, to provide this additional information of aromaticity of the molecule and atoms. Atoms with charges and unpaired electrons are enclosed within square brackets.
using graph theory, a graph-transformation system for representing the elementary reaction steps, and a control loop for the generation of all possible reactions. Each of these three components is described in detail below. 3.1. Compact Molecular Representation. A molecular representation scheme is essential for input, storage, retrieval, and output of molecules. The number of molecules (reactants, products, and intermediates) in a complex reaction system is large, typically thousands in number. A representation scheme, consequently, should consume as little memory as possible and yet preserve the atom connectivity, electronic, and atomic information of the molecule. A character-string-based representation scheme, based on SMILES,40 has been used previously.18,20,33 SMILES is a string representation scheme for molecules with a prescribed set of rules for depicting molecules. RING employs an adapted form of standard SMILES. Adaptations were made to represent additional electronic configurations (atomtypes), such as free radicals and carbonium ions that were not represented in the original version. Figure 2 shows SMILESlike string equivalents for simple compounds. A given compound can have multiple string representations. For example, ethanol could be represented either as “CCO” or as “OCC”; a unique string representation for every molecule therefore, requires a “canonical” string which is generated in RING using the CANGEN algorithm.41 This algorithm ranks the atoms in a molecule on the basis of their invariant atomic properties (such as atomic weight, atomic number, charge, valency, unpaired electrons, and lone pairs) and those of their neighboring atoms. The ranks are then used to construct the canonical SMILES-like string. 3.2. Unambiguous Description of Reaction Rules. The reaction rules of a rule-based generation tool convey the elementary steps of the system being modeled. The framework for description of reaction rules should allow an unambiguous specification of the atoms and bonds participating in a reaction. In addition, the framework should also have a provision for preventing a combinatorial explosion. With these goals in mind, a framework with a three-step procedure for reaction description was adopted. Consider the example of adsorption of a ketone on an acid catalyst, as shown in Figure 3.
C1[!dO]C2
interpretation two carbon atoms connected by a single bond carbon and oxygen atoms connected by a single bond two carbon atoms connected by any type of bond a carbon atom doubly bonded to an oxygen; 1 and 2 are labels given to the atoms a carbon atom labeled 1 singly bonded to another carbon atom labeled 2; 1 is not connected to oxygen by a double bond.
For simplicity, the catalytic center is represented in the figure as H+ (a proton). First, the reactant pattern, consisting of the set of atoms and bonds participating in a reaction, is determined. In this case, the keto functional group, CdO, and the acid site, H+, constitute the reactant pattern.33,42 The reaction step proceeds with bond formation between the proton and the oxygen. The charge on the proton is transferred to the carbon, and the double bond of the keto group weakens to form a single bonded, C+O. This description of the transformation operations constitutes the second step. To prevent the generation of reactions that are infeasible, additional constraints, such as those requiring the reactant to be a neutral species, need to be specified. In addition, constraints on size and structure can be imposed by either the chemical process or the nature of the catalyst; consequently, there may be an upper bound on the size of the molecule or a restriction that the molecule should not be highly branched. The description of such constraints constitutes the third and final step. Ratkiewicz et al., in their tool COMGEN,18 adopt a string representation for reactant patterns based on SMARTS,43 which contains well-defined rules and symbols to represent patterns in a molecule. RING adopts a SMARTS-like representation that is more comprehensive than that used in COMGEN because additional symbols are employed to represent atom environments and classes of atoms and bonds. Table 1 gives examples of different pattern strings and their interpretations. Reaction rules can be either unimolecular or bimolecular; the latter case consists of two reactant patterns, one for each reactant. Modifications in structure and electronic configuration comprise the description of transformation operations. Structural changes are transformations that increase or decrease the bond order and change connectivity (e.g., the formation and cleavage of a bond), whereas changes in electronic configuration incorporate changes in charge or electron density of the atoms participating in the reaction (such as the neutral carbon acquiring a positive charge and becoming a carbocation). Labels, such as 1 and 2 of the pattern C1dO2 in Table 1, are given to the atoms of the reactant pattern for their identification when describing the transformation operations. For example, in the case of adsorption of a ketone, the pattern C1dO2 describes the carbonyl group as the reactant pattern (with carbon labeled 1 and oxygen labeled 2), whereas the transformation operations include modifying the atomtype of 1 to C+ and decreasing the bond order of the bond between atoms 1 and 2. The constraints of a rule can be at the molecular level or at the level of the atom. Molecular constraints can be further
10462
Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010
Table 2. Examples of Boolean Constraint Expressions constraint expression
interpretation
{s < 6}&!{r} {q0}&{C ) C} (!{r}&{s < 9}) | ({r}&{s < 7})
size less than 6 heavy atoms and not a ring (linear) neutral molecule and olefinic linear with size less than 9 OR cyclic with size less than 7
Table 3. Reaction Rule for Adsorption of Ketone on an Acid Site reaction rule Reactant Pattern 1. C1[!H][!O])O2 2. H+3 Transformation Operations 1. modify atomtype of 3 to H 2. modify atomtype of 1 to C+ 3. decrease bond order of bond (1, 2) 4. connect atoms 2 and 3 Constraints pattern 1 - {s < 6}&!{r}&{q0}
classified as constraints on charge, size, or structure; for example, constraints such as a molecule being neutral or acyclic. These constraints are specified as strings with a defined set of symbols and are interpreted by the network generator. RING allows multiple molecular constraints to be combined into complex boolean strings, as shown in Table 2, thereby providing additional leverage in describing reaction rules. The constraints on the atom pertain to conditions on the local environment around a specific atom and are specified in the pattern strings. For example, the pattern C1[!dO]C2, which indicates a pattern of two carbons singly bonded to each other, includes an atomlevel constraint that forbids atom 1 to be a carbonyl carbon. The framework thus offers flexibility in describing reaction rules to varying levels of detail, from providing just the minimal information comprising only the reactant pattern and the transformations to a highly detailed description having complex constraints governing the rule. Table 3 summarizes the reaction rule for the case shown in Figure 3. Two patterns are created: one for the keto group and the second for the proton. Four transformation operations describe the changes in bonding and electronic configuration, whereas the constraint on the first pattern requires the reactant to be of size less than 6, acyclic and neutral. The atom-level constraints on 1, [!H], and [!O] prohibit the atom from being singly bonded to hydrogen or connected to oxygen atoms and consequently prevents aldehyde, acid, and ester groups from participating in a such reaction. Table 3 is representative of the reaction rule and is not the actual form of the input to RING. These rules are currently written as C++ commands directly into the code of RING; a user-interface is currently being developed and is discussed briefly later. 3.3. Network Generator. The user inputs the initial reactants (as SMILES-like strings), defines the reaction rules (also called reactiontypes), and in addition, can also provide a list of global constraints that are to be satisfied by all molecules. The overall process of generation of reactions, based on these rules, is managed by the network generator which performs three functions. First, it maintains lists of molecules (reactants, intermediates, and products), reaction rules (also referred to as reactiontypes), and reactions. Second, it creates internal representations of molecules and patterns and finds matches of the patterns in the molecule. Third, it finds all possible reactions that each molecule can undergo. The second and third functions are discussed in detail below.
Figure 4. Sample patterns and their instances in 1-hydroxy propan-2-one.
The internal representation of molecules and patterns are based on chemical graph theory.44 In this representation scheme, atoms are nodes and bonds are the edges of a graph. The nodes and the edges of this graph have attributes specifically associated with them: the nodes (atoms) contain atomic information and the electronic configuration, and the edges (bonds) contain bond order information of the bonds connecting the atoms. The nodal attributes allow different electronic configurations (or atomtypes) of a given type of element to be assigned for each atom. For example, carbon can exist as neutral carbon C, carbenium ion C+, radical C•, carbene C:, etc. The atomtype of an atom can, hence, be modified in a reaction to take up any other allowed electronic configuration. When a molecule object is created, its SMILES-like string is interpreted and a molecular graph is created. Graph-based algorithms are used to determine further characteristics of the molecule. For example, additional functions for generating the unique SMILES, finding all rings45 in the molecule, and detecting aromaticity46 use graph algorithms such as depth-first and breadth-first traversal. When a pattern object is created, its SMARTS-like string is parsed to generate the pattern graph. The instances of a specific pattern in a given molecule are found using the Ullmann algorithm for subgraph isomorphism.47 This algorithm finds all parts of the molecular graph (also called subgraphs) identical to the pattern graph. Figure 4 shows some examples of matches in a molecule wherein atoms are labeled for the sake of easy reference. For example, a pattern CdO creates a pattern graph corresponding to the keto group; the Ullmann algorithm subsequently finds identical subgraphs of this pattern in the molecule and returns the subgraph C2dO3 as a match. This internal representation, along with the framework for describing reaction rules discussed in section 3.2, inherently provides a method for reaction generation based on graph transformation.48 The instances of a reactant pattern, also known as reaction centers, are first identified in the molecule. If a molecule has the appropriate reaction center, then all the constraints of the reaction rule are evaluated on the molecule. For bimolecular reaction rules, a second molecular graph corresponding to the coreactant is taken, and instances of the second reactant pattern are identified. If the reactants satisfy all the constraints, the transformation operations of the reaction rule are applied on the molecular graphs. The attributes of the atoms and the bonds of the reaction center in the molecule are modified according to the reaction rule. The connected components of the transformed graphs are the products of the reaction. Each instance of the reactant pattern leads to a different reaction. For example, Figure 5 extends the case shown in Figure 3 and shows two instances of the reactant pattern leading to two possible reactions. The number of reactions for a bimo-
Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010
Figure 5. Adsorption of dicarbonyl on an acid site. There are two instances of the keto group pattern CdO, which result in two possible adsorption steps.
lecular reactiontype is the product of the number of possible reaction centers of the two reactants. The network generator also generates all possible reactions allowed by the reactiontypes, which is essential for mechanistic modeling of any reaction system. Figure 6 shows the flowchart for the overall process of reaction generation. The initial reactants are all stored in a list called the unprocessed molecule list. The process of generation of reactions begins with “popping” a molecule from this list. The generator contains a list of constraints, called the global constraints, that are to be satisfied by all molecules. For example, a standard global constraint used in all the examples shown in section 4 is to forbid consecutive double bonds such as CdCdC in all molecules. If the molecule satisfies all the constraints, it is placed in a list called the processed molecule list. For each defined reactiontype, this molecule is tested for possible reactions. This is done using the graph-rewriting technique described above. The new reactions are stored, and all product molecules not present in either of the molecule lists are added into the unprocessed molecule list. For bimolecular reactiontypes, the second molecule is taken from the processed molecule list. Once all reactiontypes are considered, the next molecule in the unprocessed molecule is popped, and the entire sequence of steps described above is repeated. The overall process of generation ends when the unprocessed list is empty. The final output is a list of all reactions that are generated.
Figure 6. Flowchart for the generation of reactions.
10463
Figure 7. Elementary steps for dehydration of fructose to produce HMF.
4. Results and Discussion A description of RING was provided in section 3. Four examples of reaction generation from the domain of biomass conversion are presented here to highlight the ability of this network generator to model different kinds of chemistries. These are (a) acid-catalyzed dehydration of fructose to form 5-hydroxymethyl furfural (HMF), (b) base-catalyzed transesterification of triglycerides, (c) gas phase pyrolysis of fatty esters, and (d) acid-catalyzed hydrolysis of HMF to produce levulinic acid. These four examples, which are topics of current research, portray the diverse chemistries characteristic of biomass conversion systems and thus form a good basis for testing the scope of RING. In all these cases, RING reproduced the mechanisms reported in the literature and, in addition, generated other possible reactions. The examples were generated on a Dell Precision T3400 workstation with a Q6600 Intel Core 2 Quad 2.6 GHz processor. 4.1. Acid-Catalyzed Dehydration of Fructose. 5-Hydroxymethyl furfural, or HMF, has been identified as a potential green fuel,11 and acid-catalyzed dehydration of fructose is one of the proposed routes for its production.49 The elementary steps50 involved in the conversion of fructose to HMF are described in Figure 7. RING then generated all possible reactions that can occur in the system within the given set of reaction rules and
10464
Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010
Figure 8. The reaction pathway for dehydration of fructose to form HMF. The SMILES-like representation of the reactants and products is also given. Table 4. Execution Details for the Examples
Figure 9. Elementary steps of base-catalyzed transesterification of triglycerides.
listed over 1500 unique reactions in addition to reproducing the mechanism reported by Antal et al.50 (Figure 8). 4.2. Base-Catalyzed Transesterification of Triglycerides. Transesterification of triglycerides by methanol, under basic conditions, is the preferred way of producing biodiesel.11 In this process, the triglycerides are broken down to diglycerides; monoglycerides; and finally, to glycerol. The mechanism11,51 involves four steps, as shown in Figure 9. These elementary steps along with the triglyceride shown in Figure 10 were the inputs. RING generated all possible reactions with these four steps, including the mechanism reported in the literature.51 4.3. Pyrolysis of Fatty Esters. Pyrolysis is the thermal decomposition of large molecules into smaller products via gas phase free radical chemistry. Pyrolysis of fatty acid esters leads to the formation of linear alkanes and aromatics.11 The mechanism provided by Schwab et al.52 consists of 10 elementary steps, as shown in Figure 11. The first step involving the formation of a diene is not an elementary step and proceeds through an allylic intermediate. For the sake of simplicity, the additional steps were ignored. Similarly, the dehydrogenation of cyclohexene shown in step 6 is not an elementary step and has been considered as a single step for simplicity. These two simplifications indicate that reaction rules need not always be elementary, and nonelementary rules can also be described if they are unimolecular or bimolecular. Figure 12 shows the
system
execution time (s)
no. of molecules
no. of reactions
fructose to HMF transesterification pyrolysis HMF to levulinic acid
26 4 2 366
559 89 54 4456
1528 140 67 12253
reactions that comprise the pyrolysis mechanism52 for ethyl oleate as the initial reactant. RING again reproduced the mechanism reported by Schwab et al.52 4.4. Levulinic Acid from HMF. Levulinic acid is the precursor for potential oxygenated fuel additives11 and can be synthesized from HMF in an acid-catalyzed reaction,53 producing formic acid as a byproduct. The elementary steps involved in the generation of levulinic acid from HMF are given in Figure 13. RING generated more than 12 000 reactions; those that constitute the mechanism given in the literature53 are shown in Figure 14. 4.5. Discussion. In all these four examples, global constraints prohibited consecutive double bonds and cations. The maximum allowable size was set to 25 heavy atoms in all cases, except for the case of HMF to levulinic acid, for which a tighter restriction of 11 was set on the basis of the mechanism reported in the literature. The number of reactions and molecules (reactants, products, and intermediates) generated in each case is given in Table 4. The data in Table 4 show that RING can be used to construct reaction systems of diverse chemistry and varied size. It can, therefore, be used to construct a variety of systems pertaining to biomass conversion. The execution time for these systems was only a few minutes in all the cases and as little as 2 s for pyrolysis, indicating that large systems can be constructed in a reasonable time. The examples of catalysis considered in this paper, though homogeneous, can be used to model heterogeneous catalytic systems as well. For example, the acid site of a solid acid catalyst can be represented as H+ for convenience. This representation can also be extended to all surface species because they can be represented by the corresponding ions. For example, surface alkoxides can be represented as carbenium ions.20,33 In RING, the carbonium ion is repre-
Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010
10465
Figure 10. Mechanism of transesterification of triglycerides to form diglycerides adapted from the literature.11,51 The SMILES-like string of the reactants and the products of each reaction, as generated by RING, is also shown.
Figure 11. Elementary steps occurring in pyrolysis of fatty acid ester. Note that steps 1 and 6 are not elementary and can be further resolved into elementary steps, if necessary.
sented as C* to distinguish it from the positively charged carbenium ion. This enables RING to describe chemistries such as alkane activation on zeolites.
The implementation of SMARTS-like strings, constraint expressions, and graph transformation system makes the framework for describing reactions in RING generic. The internal
10466
Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010
Figure 12. Pyrolysis mechanism of ethyl oleate adapted from the mechanism of Schwab et al.52 The SMILES-like string of the reactants and the products of each reaction, as generated RING, is also shown.
graphical representation of molecules provides an abstraction that represents the two constituents of a molecule that participate in a reaction: the atoms and the bonds. The ability of RING to handle diverse forms of chemistries is a result of these two features, which allow for (1) representation of every observed atomic configuration of an atom through its attributes; (2) representing modifications of electronic configuration of an atom as changes in attributes during the graph transformation step; (3) reflecting changes in the bonding of a molecule by modifying the edge attributes in a graph transformation step; and (4) describing constraints, either using constraint expressions to describe molecular constraints or incorporating atomic constraints within SMARTS-like strings. Table 4 indicates that a large number of reactions may be generated for certain systems. It is likely that not all of them will be significant under a given reaction condition. It is, therefore, important to be able to prune a reaction network to a manageable size. In RING, expert knowledge or additional information can be used to either remove insignificant reaction rules or include limiting constraints in reaction rules that prevent the generation of insignificant reactions. The additional information could include theoretical calculations or experimental observations.
An important aspect of any automated reaction network generator is the comprehensiveness of the reactions that are generated. In other words, the tool should not overlook a valid reaction or generate an incorrect reaction. Hsu et al.20 indicate that proving comprehensiveness for a large reaction system is an open problem. Instead, the comprehensiveness of the reactions generated using RING can be inferred on the basis of three analyses. First, it is clear that RING does not overlook obvious reactions because it generates all the reactions reported in the literature in each of the four examples discussed above. Second, RING was tested using a small reaction system, for which it is easier to manually enumerate the number of possible reactions of certain reaction steps. The output of RING can then be analyzed to check if the requisite number of reactions of these reaction steps has been generated. Third, for a given reaction step, the number of reactions generated by different reactants of the same homologous series can be easily calculated and used to verify the output of RING. The second and third analyses have been carried out separately and are documented in the Supporting Information. Different reaction sequences leading from a reactant to a final product can be determined from the list of all reactions using RING. Since all the reaction pathways of a given molecule can be quickly determined if the elementary steps are known, RING
Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010
10467
Figure 13. Elementary steps used for levulinic acid synthesis from HMF.
can be used to identify novel thermochemical pathways that can serve as potential synthesis routes for biofuels and chemicals. The list of generated reactions and species can be used to build mechanisms and develop microkinetic models to simulate the different conversion processes in a biorefinery. In addition to its applicability in biomass conversion, this tool can also be used to construct reaction networks of other complex systems, such as hydrocracking or combustion, because the underlying theory governing the tool is generic. RING still has a few limitations in terms of representing molecules and intermediates. For example, there exists no compact representation for representing polymers such as lignin. Furthermore, unlike acid catalysts that can be represented as H+ for convenience, metal catalysts cannot be represented at all. Therefore, other types of catalysts and intermediates involving multiple catalytic sites cannot be represented. In addition, hydrogen bonding of oxygencontaining intermediates cannot yet be represented. These limitations currently restrict the ability of the tool to model solid catalysis. RING is currently implemented as a C++ library of classes and functions, and the reaction rules and the initial reactants of a system are written into a C++ code that can access this library. Furthermore, the extensive use
of strings containing alphanumeric or special characters for representing molecules, patterns, and constraints makes it unintuitive for a user to define reaction rules in RING. A user-interface in the form of a domain specific language that mimics graph-rewriting is planned. Research is in progress to address these limitations and to thereby expand the scope of application as well as software usability of RING. Nevertheless, this tool currently offers a framework to construct reaction networks of different chemical systems in an exhaustive manner. 5. Conclusions A rule-based tool reaction generation tool RING was presented as a single platform for constructing complex reaction networks of different chemical systems in biomass conversion. RING adopts established methods and algorithms from Cheminformatics to extend the techniques used by other reaction generation tools designed for specific types of chemistry. Reaction rules in RING are described using a framework with a three-step modular procedure. Molecules have an abstract representation in the form of a molecular graph, and a graph-transformation system that applies chemi-
10468
Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010
Figure 14. Mechanism for conversion of HMF to levulinic acid based on Horvat et al.53 The SMILES-like string of the reactants and products of these reactions, as generated by the RING, is also shown.
cal transformations on the molecular graph is the abstraction of a reaction. A generic scheme for describing constraints, to variable degrees of complexity, is implemented in RING to describe restrictions that can be imposed in a reaction system based on physical and chemical arguments. Four relevant chemical systems (namely, dehydration of fructose to produce HMF, base-catalyzed transesterification of trig-
lycerides, gas phase pyrolysis of fatty esters, and acidcatalyzed hydrolysis of HMF to form levulinic acid) that are representative of the diverse types of chemistry potentially valuable in biomass conversion were considered for exhaustive generation of reactions. RING reproduced the mechanisms reported in the literature for the all these systems. RING can thus model heterogeneous and homogeneous
Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010
reactions in the thermochemical routes to biomass conversion. Some limitations, such as representing hydrogen bonding, however, still exist in the tool and form the subjects of current research undertaken to make RING more versatile. Acknowledgment The authors thank Dr. Shuo-Huon Hsu in OSIsoft, LLC, San Leandro, CA for technical discussions on automated reaction generation. The authors also acknowledge financial support from the Institute on the Environment (Discovery Grant: DG-000908) and Initiative for Renewable Energy (Large Grant: RL-000409) at the University of Minnesota and from the National Science Foundation Emerging Frontiers in Research and Innovation program, Grant no. 0937706. Supporting Information Available: Discussion of two analyses to prove comprehensiveness of RING and a list of reactions generated for the two analyses. This material is available free of charge via the Internet at http://pubs.acs.org. Literature Cited (1) Schubert, C. Can biofuels finally take center stage. Nat. Biotechnol. 2006, 24 (7), 777. (2) Ragauskas, A. J.; Williams, C. K.; Davison, B. H.; Britovsek, G.; Cairney, J.; Eckert, C. A.; Frederick, W. J.; Hallett, J. P.; Leak, D. J.; Liotta, C. L.; Mielenz, J. R.; Murphy, R.; Templer, R.; Tschaplinski, T. The path forward for biofuels and biomaterials. Science 2006, 311 (5760), 484. (3) Regalbuto, J. R. Cellulosic Biofuels-Got Gasoline. Science 2009, 325 (5942), 822. (4) U.S. DOE; Biomass as feedstock for a bioenergy and bioproducts industry: The technical feasibility of a billion-ton annual supply, http:// www1.eere.energy.gov/biomass/pdfs/final_billionton_vision_report2.pdf, April 2005 (accessed January, 2010). (5) Clark, J. H.; Budarin, V.; Deswarte, F. E. I.; Hardy, J. J. E.; Kerton, F. M.; Hunt, A. J.; Luque, R.; Macquarrie, D. J.; Milkowski, K.; Rodriguez, A.; Samuel, O.; Tavener, S. J.; White, R. J.; Wilson, A. J. Green Chemistry and the biorefinery: a partnership for a sustainable future. Green Chem. 2006, 8, 853. (6) Petrus, L.; Noordermeer, M. A. Biomass to biofuels, a chemical perspective. Green Chem. 2006, 8 (10), 861. (7) Ho, T. C. Kinetic Modeling of Large-Scale Reaction Systems. Catal. ReV. 2008, 50 (3), 287–378. (8) Quann, R. J.; Jaffe, S. B. Building useful models of complex reaction systems in petroleum refining. Chem. Eng. Sci. 1996, 51 (10), 1615. (9) Ghosh, P.; Hickey, K. J.; Jaffe, S. B. Development of a detailed gasoline composition-based octane model. Ind. Eng. Chem. Res. 2006, 45 (1), 337. (10) Moro, L. F. L. Process technology in the petroleum refining industryscurrent situation and future trends. Comput. Chem. Eng. 2003, 27, 1303. (11) Huber, G. W.; Iborra, S.; Corma, A. Synthesis of Transportation Fuels from Biomass: Chemistry, Catalysts, and Engineering. Chem. ReV. 2006, 106, 4044. (12) Corma, A.; Iborra, S.; Velty, A. Chemical Routes for the Transformation of Biomass into Chemicals. Chem. ReV. 2007, 107, 2411. (13) Chheda, J. N.; Huber, G. W.; Dumesic, J. A. Liquid-Phase Catalytic Processing of Biomass-Derived Oxygenated Hydrocarbons to Fuels and Chemicals. Angew. Chem. Int. Ed. 2007, 46, 7164–7183. (14) Carlson, T. R.; Vispute, T. P.; Huber, G. W. Green Gasoline by Catalytic Fast Pyrolysis of Solid Biomass Derived Compounds. ChemSusChem 2008, 1, 397. (15) Dauenhauer, P. J.; Dreyer, B. J.; Degenstein, N. J.; Schmidt, L. D. Millisecond Reforming of Solid Biomass for Sustainable Fuels. Angew. Chem., Int. Ed. 2007, 46, 5864. (16) Lin, Y.-C.; Huber, G. W. The critical role of heterogeneous catalysis in lignocellulosic biomass conversion. Energy EnViron. Sci. 2009, 2, 68. (17) Broadbelt, L. J.; Stark, S. M.; Klein, M. T. Computer-generated pyrolysis modeling: on the fly generation of species, reactions and rates. Ind. Eng. Chem. Res. 1994, 33 (4), 790. (18) Ratkiewicz, A.; Truong, T. N. Application of chemical graph theory for automated mechanism generation. J. Chem. Inf. Model. 2003, 43, 36.
10469
(19) Prickett, S. E.; Mavrovouniotis, M. L. Construction of complex reaction systems 0.2. Molecule manipulation and reaction application algorithms. Comput. Chem. Eng. 1997, 21 (11), 1237. (20) Hsu, S. H.; Krishnamurthy, B.; Rao, P.; Zhao, C. H.; Jagannathan, S.; Venkatasubramanian, V. A domain-specific compiler theory based framework for automated reaction network generation. Comput. Chem. Eng. 2008, 32 (10), 2455. (21) Todd, M. H. Computer-aided organic synthesis. Chem. Soc. ReV. 2005, 34 (3), 247. (22) Engel, T. Basic Overview of Cheminformatics. J. Chem. Inf. Model. 2006, 46 (6), 2267. (23) Corey, E. J.; Long, A. K.; Rubenstein, S. D. Computer-Assisted Analysis in Organic Synthesis. Science 1985, 228 (4698), 408. (24) Jones, M., Jr. Organic Chemistry; W. W. Norton & Company: New York, 1997. (25) Dugundji, J.; Ugi, I. An algebraic model of constitutional chemistry as a basis for chemical computer programs. Top. Curr. Chem. 1973, 39, 19. (26) Ugi, I.; Bauer, J.; Bley, K.; Alf, D.; Dietz, A.; Fortain, E. Computer assisted solution of chemical problemssThe historical development and present state of the art of a new discipline of chemistry. Angew. Chem., Int. Ed. Engl. 1993, 32, 201. (27) Ugi, I.; Bauer, J.; Blomvberger, C.; Brandt, J.; Dietz, A.; Fontain, E.; et al. Models, concepts, theories, and formal languages in chemistry and their use as a basis for computer assistance in chemistry. J. Chem. Inf. Comput. Sci. 1994, 34, 3. (28) Quann, R. J.; Jaffe, S. B. Structure oriented lumpingsDescribing the chemistry of complex hydrocarbon mixtures. Ind. Eng. Chem. Res. 1992, 31 (11), 2483. (29) Jaffe, S. B.; Freund, H.; Olmstead, W. N. Extension of StructureOriented Lumping to Vacuum Residua. Ind. Eng. Chem. Res. 2005, 44 (26), 9840. (30) Kruse, T. M.; Wong, H.-W.; Broadbelt, L. J. Mechanistic Modeling of Polymer Pyrolysis: Polypropylene. Macromolecules 2003, 36 (25), 9594. (31) Li, C.; Henry, C. S.; Jankowski, M. D.; Ionita, J. A.; Hatzimanikatis, V.; Broadbelt, L. J. Computational discovery of biochemical routes to specialty chemicals. Chem. Eng. Sci. 2004, 59, 5051. (32) Wong, H.-W.; Li, X.; Swihart, M. T.; Broadbelt, L. J. Detailed Kinetic Modeling of Silicon Nanoparticle Formation Chemistry via Automated Mechanism Generation. J. Phys. Chem. A 2004, 108 (46), 10122. (33) Prickett, S. E.; Mavrovouniotis, M. L. Construction of complex reaction systems 0.1. Reaction description language. Comput. Chem. Eng. 1997, 21 (11), 1219. (34) Broadbelt, L. J.; Stark, S. M.; Klein, M. T. Termination of Computer-Generated Reaction Mechanisms: Species Rank-Based Convergence Criterion. Ind. Eng. Chem. Res. 1995, 34 (8), 2566. (35) Susnow, R. G.; Dean, A. M.; Green, W. H.; Peczak, P.; Broadbelt, L. J. Rate-Based Construction of Kinetic Models for Complex Systems. J. Phys. Chem. A 1997, 101 (20), 3731. (36) Warth, V.; Battin-Leclerc, F.; Fournet, R.; Glaude, P. A.; Come, G. M.; Scacchi, G. Computer based generation of reaction mechanisms for gas-phase oxidation. Comput. Chem. 2000, 25, 541. (37) Chinnick, S. J.; Baulch, D. L.; Ayscough, P. B. An Expert System for Hydrocarbon Pyrolysis Reactions. Chemom. Intell. Lab. Syst. 1988, 5, 39. (38) Dumesic, J. A. Analyses of Reaction Schemes Using De Donder Relations. J. Catal. 1999, 185, 496. (39) Bhan, A.; Hsu, S.-H.; Blau, G.; Caruthers, J. M.; Venkatasubramanian, V.; Delgass, W. N. Microkinetic modeling of propane aromatization over HZSM-5. J. Catal. 2005, 235, 35. (40) Weininger, D. SMILES, A chemical language and information systems. 1. Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28 (1), 31. (41) Weininger, D.; Weininger, A.; Weininger, J. L. SMILES. 2. Algorithm for generation of unique SMILES notation. J. Chem. Inf. Comput. Sci. 1989, 29 (2), 97. (42) Blurock, E. S. Reaction: System for Modeling Chemical Reactions. J. Chem. Inf. Comput. Sci. 1994, 35, 607. (43) Daylight Chemical Information Systems, Inc.; Daylight Theory Manual; 2008; http://www.daylight.com/dayhtml/doc/theory/index.html (accessed Jan 2010). (44) Trinajstic, N. Chemical Graph Theory, 2nd ed.; CRC Press: Boca Raton, FL, 1992. (45) Hanser, T.; Jauffret, P.; Kaufmann, G. A New Algorithm for Exhaustive Ring Perception in a Molecular Graph. J. Chem. Inf. Model. 1996, 36, 1146.
10470
Ind. Eng. Chem. Res., Vol. 49, No. 21, 2010
(46) Steinbeck, C.; Han, Y. Q.; Kuhn, S.; Horlacher, O.; Luttmann, E.; Willighagen, E. The Chemistry Development Kit (CDK): An open-source Java library for chemo- and bioinformatics. J. Chem. Inf. Comput. Sci. 2003, 43 (2), 493. (47) Ullmann, J. R. An Algorithm for Subgraph Isomorphism. J. ACM 1976, 23 (1), 31. (48) Benko¨, G.; Flamm, C.; Stadler, P. F. A Graph-Based Toy Model of Chemistry. J. Chem. Inf. Comput. Sci. 2003, 43 (4), 1085. (49) Torres, A. I.; Tsapatsis, M.; Daoutidis, P. Continuous production of 5-Hydroxymethylfurfural from fructose: a design case study. Energy and EnVironmental Science, submitted. (50) Antal, M. J.; Mok, W. S. L.; Richards, G. N. Mechanism of formation of 5(hydroxymethyl)-2-furaldehyde from D-fructose and sucrose. Carbohydr. Res. 1990, 199, 91.
(51) Schuchardt, U.; Sercheli, R.; Vargas, R. M. Transesterification of vegetable oils. J. Braz. Chem. Soc. 1998, 9, 199. (52) Schwab, A. W.; Dystra, G. J.; Selke, E.; Sorenson, S. C.; Pryde, E. H. Diesel fuel from thermal decomposition of soybean oil. J. Am. Oil Chem. Soc. 1988, 65, 1781. (53) Horvat, J.; Klaic, B.; Metelko, B.; Sunjie, V. Mechanism of levulinic acid formation. Tetrahedron Lett. 1985, 26, 2111.
ReceiVed for reView March 8, 2010 ReVised manuscript receiVed May 19, 2010 Accepted May 20, 2010 IE100546T