Computational Framework for the Identification of Bioprivileged

Dec 11, 2018 - Xiaowei Zhou† , Zachary J. Brentzel‡ , George A. Kraus§ , Peter L. Keeling∥ , James A. Dumesic‡ , Brent H. Shanks∥⊥ , and ...
0 downloads 0 Views 4MB Size
This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes.

Research Article Cite This: ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

pubs.acs.org/journal/ascecg

Computational Framework for the Identification of Bioprivileged Molecules Xiaowei Zhou,† Zachary J. Brentzel,‡ George A. Kraus,§ Peter L. Keeling,∥ James A. Dumesic,‡ Brent H. Shanks,∥,⊥ and Linda J. Broadbelt*,†

Downloaded via 95.85.68.38 on January 2, 2019 at 06:52:14 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.



Department of Chemical and Biological Engineering, Northwestern University, 2145 Sheridan Road, Evanston, Illinois 60208, United States ‡ Department of Chemical and Biological Engineering, University of Wisconsin-Madison, 1415 Engineering Drive, Madison, Wisconsin 53706, United States § Department of Chemistry, Iowa State University, 2759 Gilman, Ames, Iowa 50011-3111, United States ∥ Center for Biorenewable Chemicals (CBiRC), Iowa State University, 1140 Biorenewables Research Laboratory Building, Ames, Iowa 50011, United States ⊥ Department of Chemical and Biological Engineering, Iowa State University, 2119 Sweeney Hall, Ames, Iowa 50011, United States S Supporting Information *

ABSTRACT: Bioprivileged molecules are biology-derived chemical intermediates that can be efficiently converted to a diversity of chemical products including both novel molecules and drop-in replacements. Bridging chemical and biological catalysis by bioprivileged molecules provides a useful and flexible new paradigm for producing biobased chemicals. However, the discovery of bioprivileged molecules has been demonstrated to require extensive experimental effort over a long period of time. In this work, we developed a computational framework for identification of all possible C6HxOy molecules (29252) that can be honed down to a manageable number of candidate bioprivileged molecules based on analysis of structural features, reactive moieties, and reactivity of species, and the evaluation of the reaction network and resulting products based on automated network generation. Required input is the structure data file (SDF) of the starting molecules and the reaction rules. On-the-fly estimation of thermodynamics by a group contribution method is introduced as a screening criterion to identify the feasibility of reactions and pathways. Generated species are dynamically linked to the PubChem database for identification of novel products and evaluation of the known products as attractive candidates. Application of the proposed computational framework in screening 29252 C6 species and identifying a list of 100 C6HxOy bioprivileged molecule candidates is presented. Each of the 100 candidate molecules falls into one of nine broad compound classes and is typically composed of carbon atoms with a different chemical environment and, as a result, distinct reactivity patterns. Sensitivity analysis of the parameters used in the filtering steps leading to the candidate molecules that were identified is discussed, and analysis of favorable structural features, reactive moieties, and functionalities of C6HxOy candidate bioprivileged molecules is performed. KEYWORDS: Bioprivileged molecules, Biobased chemicals, Computational molecule design, Automated network generation, Reactivity



order to find the intermediates that are candidates for its precursor.

SIGNIFICANCE

Bioprivileged molecules are emerging as a useful new paradigm for developing biobased chemicals. This work developed a computational framework for systematically screening and identifying C6 bioprivileged molecule candidates and associated feasible chemical pathways to biobased chemicals including novel compounds, thereby guiding experimentation and computational analysis. Analysis of the 100 bioprivileged molecule candidates identified from 29252 C6 compounds creates a paradigm for establishing a rational structure for the design of bioprivileged molecules in general. The computational framework can be extended to identification of other biological candidates including C4s and C5s. The framework also can be run in reverse if a specific final biobased product were desired in © XXXX American Chemical Society



INTRODUCTION

The current petrochemical industry is established on the basis of primary petrochemicals (i.e., olefins, aromatics, and synthesis gas), which are building blocks for chemical synthesis of a wide variety of materials and chemical products. In the future, renewable carbon processed through biorefineries, which integrate biomass conversion processes and equipment to Received: October 12, 2018 Revised: December 3, 2018 Published: December 11, 2018 A

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

hydroxymethylfurfural (5-HMF), and triacetic acid lactone (TAL) have been reported as example C6 bioprivileged molecules by Shanks and Keeling.28 With the use of TAL as a specific example, its biological production has been demonstrated by researchers,29−34 and a diversity of chemicals as its products has been reported.28,35−47 TAL demonstrates the power of a bioprivileged molecule as it would typically not be considered a valuable target for engineered microbes due to its limited direct utility. However, as an intermediate subjected to diversity-oriented synthesis that creates products as apparently disparate as sorbic acid (food preservative activity) and styrenylpyrones (antiobesity activity), it is highly attractive. Pursuing a merely retrosynthetic approach for either of these products would not likely coalesce on TAL as an important synthetic precursor. A second valuable aspect of TAL as a bioprivileged molecule is that a family of molecules can be produced with the same synthetic approach while yielding enhanced properties. Accelerating biobased chemical product development via bioprivileged molecules has the promise of introducing novel chemicals and greatly expanding the biobased chemical horizon. Unfortunately, there is no current strategy for the direct identification and development of bioprivileged molecules. Moreover, one of the principal challenges associated with any system that combines chemical and biological catalysis is to determine at what point in the selective defunctionalization of biomass carbohydrates should an intermediate molecule be passed from the biological process to the chemical process. Generally, multiple functionality is required for a bioprivileged molecule to allow diversity in product slates, but excess functionality may lead to selectivity challenges for chemical catalysis. Furthermore, identification of a bioprivileged molecule can be very challenging based on using an experimental approach alone since the typical approach is largely serendipitous targeting of a specific chemical compound or reaction process. The discovery of TAL as a promising bioprivileged molecule by the collective efforts of a group of chemists, biologists, and chemical engineers in CBiRC involved extensive experimental efforts over several years. An approach that combines heterogeneous catalysis, biological catalysis, and computational network analysis would be particularly useful for accelerating the discovery of new, highly flexible processes for the production of chemicals from biomass through bioprivileged molecules, as noted in our previous reviews.14,27,28 In this work, we developed a computational framework for systematic identification of bioprivileged molecules that is based on analysis of structural features, reactive moieties and reactivity of species, and the evaluation of the reaction network and resulting products based on automated network generation (NetGen) developed by Broadbelt et al.48 On-the-fly estimation of thermodynamics by a group contribution method is introduced as a screening criterion to identify the feasibility of the reactions and pathways. Generated species are dynamically linked to the PubChem database for identification of novel species and evaluation of the attractiveness of the products that are known. The proposed computational framework is applied in screening 29252 C6 species and identifying a preliminary list of candidate C6HxOy bioprivileged molecules. Sensitivity analysis of the screening process and analysis of the structural features are also discussed.

produce biofuels, power, heat, and a spectrum of biobased products, will be an increasingly important source for those products.1−5 With rapidly growing interest in biobased chemicals, significant advances have been achieved based on two major production strategies, biological and chemical. The biological production of chemicals has a long history, including production of several commodity chemicals from biomassderived carbohydrates on a commercial scale, with recent accomplishments related to lactic acid, 1,3-propanediol, succinic acid, and muconic acid.6−8 Additional examples of the biological production of C2−C6 platform chemicals have been extensively reviewed.9−13 On the other hand, the chemical catalytic conversion of biomass to chemicals has focused primarily on the dehydration of carbohydrates to produce either 5-hydroxymethylfurfural (HMF) from cellulose-derived C6 sugars or furfural from hemicellulose-derived C5 sugars as attractive platform chemicals.14−17 HMF and furfural are also among the list of top chemical opportunities from biorefinery carbohydrates as identified by Bozell and Petersen.9 HMF can be used to produce biofuels, monomers, and a diversity of useful chemicals like 2,5-furan dicarboxylic acid (FDCA), dimethylfuran, 5hydroxy-4-keto-2-pentenoic acid, levulinic acid (LA), and others.15,18−20 One of the many products derived from LA is γ-valerolactone (GVL) that has excellent properties as a solvent and is a precursor for high-value chemicals and fuels.21−26 Additionally, levulinic acid and 2,5-furan dicarboxylic acid are promising building blocks from carbohydrate-based feedstocks (the “Top 10”) identified by the Department of Energy13 and confirmed by Bozell and Petersen.9 By developing a list of specific structures, the reports9,13 simultaneously embraced fundamental research needs as a guide for product identification. Efficient and economic production of biobased chemicals via biological catalysis hinges heavily on clear identification of specific target products for focusing genetic engineering efforts. In addition, biology may not be the most effective choice to complete the transformation of oxygenated intermediates to commodity and specialty chemicals. Therefore, chemical catalysis routes are often utilized to diversify the product slates of platform chemicals to value-added chemicals. In contrast, the principal challenge for catalytic upgrading of biomass-derived sugars into biobased chemicals using chemical catalysts alone is the selective activation of multiple identical C−OH bonds of carbohydrates. Therefore, the integration of biological catalysis and chemical catalysis is emerging as an effective strategy to produce biobased chemicals from biomass carbohydrates.14,27 Very recently, the concept of bioprivileged molecules was introduced by Shanks and Keeling28 as a useful new paradigm for bridging the biological and chemical catalysis gap to develop biobased chemicals. Bioprivileged molecules are defined as “biology-derived chemical intermediates that can be efficiently converted to a diversity of chemical products including both novel molecules and drop-in replacements”.28 Key features of a bioprivileged molecule include the origin from a biological feedstock (cannot be effectively accessed from petrochemical feedstocks), reactivity allowing for facile transformations and diversity in product slates, and production of valuable chemicals encompassing both existing chemical products and novel biochemical species that can impart new product properties. An important conceptual evolution of bioprivileged molecules from biobased platform chemicals is the objective that a bioprivileged molecule provides a basis for diversity-oriented synthesis leading to novel bioproducts.28 Muconic acid, 5B

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

ACS Sustainable Chemistry & Engineering





DEVELOPMENT OF COMPUTATIONAL FRAMEWORK BASED ON NETWORK GENERATION We used the approach of automated network generation building on the previous work of Broadbelt and co-workers49−54 that generates a reaction network comprised of every possible reaction based on a set of reaction rules for given starting molecules. Molecules are represented as graphs and matrices, and operations on these representations allow reactions to be carried out, molecule uniqueness to be determined, and properties to be calculated. NetGen has been applied to a range of fields such as catalytic cracking of hydrocarbons, pyrolysis of hydrocarbons, pyrolysis of biomass and its major components, polymerization and depolymerization, production of silicon nanoparticles, tropospheric ozone formation, and biochemical transformations. Intriguingly, the concept and modeling framework of NetGen have also been extended to develop a computational program named Biological Network Integrated Computational Explorer (BNICE) to automatically design synthetic pathways for the production of a compound of interest, using generalized reaction rules curated from biochemical reaction databases.49−54 Recently, Stine et al. have demonstrated BNICE as a biochemical pathway-searching tool to predict previously unknown promiscuous enzymatic activities and novel biochemical pathways for industrial applications.53 BNICE can be applied in the identification of biochemical pathways for the production of bioprivileged molecules or biology-derived chemical species of any category. Another important component to the definition of bioprivileged molecules is that a bioprivileged molecule must allow for ready transformation to a number of valuable chemicals. This work focused more on chemical catalytic conversion of bioprivileged molecule candidates to novel and drop-in products than the biological synthesis of bioprivileged molecule candidates from carbohydrates. To computationally generate possible pathways for the transformation of a candidate bioprivileged molecule to product chemicals using NetGen, a key development in the present work is the definition of the reaction operators encoding reaction types of interest for conversion of potential bioprivileged molecules.

Research Article

DEVELOPMENT OF REACTION OPERATORS ENCODED WITH REACTION RULES FOR EACH REACTION FAMILY In this work, the reactions and reaction rules implemented in NetGen were derived directly from common catalytic reactions for converting and upgrading biomass-derived oxygenates,28,55−59 allowing for the prediction of catalytic reaction pathways of candidate bioprivileged molecules. Table 1 lists 16 types of common catalytic reaction families considered in this work, including hydrogenation, hydrogenolysis, hydrodeoxygenation, hydrolysis, dehydration, hydration, decarbonylation, decarboxylation, ketonization, esterification, keto−enol tautomerization, ring-opening/closing, aldol condensation, Diels− Alder, ketone amination, epoxidation, and selective oxidation. A detailed description of each reaction family, supporting literature references, and a rationale for its inclusion in the framework are provided in the Supporting Information. Note that any reaction operators can be easily turned on or off to further explore sensitivity analysis. We acknowledge that while the list of possible reactions is quite broad, this is not a comprehensive list of reactions associated with the catalytic chemistry of biomass-derived oxygenated hydrocarbons. For example, dehydrogenation of hydroxyl groups to an aldehyde or ketone, C−C cleavage by retro-aldolization, polymerization, isomerization, and reforming reactions could be potential candidates. While they are not considered in the present work as they are less prevalent or less well-studied reaction classes for the conversion of biobased chemicals than the ones that we incorporated here, they could be easily included in the computational framework. We acknowledge that there are other reaction families such as deoxydehydration (DODH),60−64 which removes two adjacent hydroxyl groups from vicinal diols and polyols in one step to generate alkenes directly, are useful and powerful reactions for making chemicals from renewable biomass resources, but are not included in the work either.



VISUALIZATION, IDENTIFICATION, AND EVALUATION OF GENERATED SPECIES IN NETGEN NetGen generates an SDF file for each species in the reaction network, including starting molecules and generated species. In this way, the structural images of each species were able to be downloaded from PubChem. Structural visualization can also be achieved by input of the SDF files to commercial software such as Chemaxon software (https://chemaxon.com/) and the CACTVS Chemoinformatics Toolkit (http://85.214.192.197/ index.htm). Moreover, the SDF files of species generated by NetGen were used as input files to query PubChem, and a script was created to connect the computational platform we developed and the PubChem database (http://pubchem.ncbi. nlm.nih.gov) for identification and evaluation of species.



FRAMEWORK FOR NETWORK GENERATION The modeling framework and working principle of the automated network generation program used here, NetGen, have been described in detail in the previous work of Broadbelt and co-workers.48 Briefly, NetGen uses a set of chemical operators that encodes reaction families and rules for their implementation. The required input is the structure of the reactants and the reaction rules. The input chemical table files in SDF format of the initial reactants are converted internally into molecular graphs that are operated on by chemical reaction rules and families to generate products, recursively applying chemical operators until all species and reactions are exhaustively generated to create a reaction network. Moreover, NetGen allows on-the-fly estimation of thermodynamic properties of species in the reaction network using a hierarchy based on experimental data and a group contribution method, allowing the prediction of thermodynamic and other properties from molecular structures by using group or atom properties. Note that NetGen is also capable of on-the-fly specification of kinetic parameters for each generated reaction. However, kinetics were not applied for any screening metrics applied here and thus are not discussed further.



CASE STUDY OF NETWORK GENERATION OF EXAMPLE BIOPRIVILEGED MOLECULE TAL NetGen tests all combinations of unreacted species for each reaction type considered, generates an exhaustive reaction network based on encoded reaction rules, and outputs a list of species, reactions, and thermochemical properties pertaining to the network. The generation of the reaction network is halted when the unreacted components list is empty. The current conditions for allowing a component to be placed in the unreacted components list are the maximum carbon count and C

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

In this work, methanol (allowed to undergo esterification only), formic acid (allowed to undergo esterification only), H2O (for hydration and hydrolysis), H2 (for hydrogenolysis, hydrogenation, and hydrodeoxygenation), NH3 (for reductive amination only), H2O2 (for selective oxidation and epoxidation), ethylene (allowed to undergo Diels−Alder reaction only), acetaldehyde (allowed to undergo aldol condensation only), and a C6HxOy molecule (in this case, TAL) were input as starting molecules (rank 0). A maximum carbon count of 6 and a rank of 1 were used as the stopping criteria for reaction network generation. Therefore, products formed from esterification, Diels−Alder, and ketonization reactions have a carbon number of greater than six, highlighted in red in Figure 1, and are not allowed to participate in further reactions. Figure 1 shows the generated reaction network comprised of various transformation pathways of TAL and 45 products. As shown in Figure 1, catalytic reaction pathways revealed by the experimental work of Dumesic and co-workers43,47 for the formation of a variety of commercially valuable chemical intermediates and end products have been reproduced. For example, the hydrogenation of TAL forming 5,6-dihydro-4-hydroxy-6-methyl-2H-pyran-2-one and 4-hydroxy-6-methyltetrahydro-2-pyrone, keto−enol tautomerization of TAL, the formation of parasorbic acid, and the formation of various acyclic precursors for 2,4-pentanedione, 3penten-2-one, 4-hydroxypentanone, 1,3-pentadiene, and sorbic acid emerge from network generation. Intriguingly, new pathways and a range of novel products such as bicyclic lactones via Diels−Alder reactions are also generated. Therefore, our approach is not only able to generate the pathways given in the literature but also capable of providing novel pathways toward novel products, demonstrating its versatility in handling a variety of chemical transformations.

Table 1. Reaction Families Encoded in the Chemical Operators of Network Generation



IDENTIFICATION OF BIOPRIVILEGED MOLECULES Although biology has demonstrated its ability to synthesize a variety of molecules ranging from C1 (methanol), to C2 (ethanol), to C6 (TAL), to C9 (p-coumaric acid), to oligomers, and to polymers from various renewable carbon sources or precursors, including sucrose, glycerol, glucose, xylose, and fructose, C6 glucose is the most abundant monosaccharide derived from biomass (e.g., from the polysaccharides cellulose or starch) and the most widely used aldohexose in living organisms as a source of organic carbon to synthesize a wide variety of other biomolecules.27,65 To demonstrate the approach and focus the starting point for exploration of potential bioprivileged molecules, C6 oxygenates were used as input to the network generation approach. The logic flowchart for computational identification of C6 bioprivileged molecules is illustrated in Figure 2. The methodology and criteria used in the DOE report13 and summarized in the review of Bozell and Petersen9 for sorting top chemical opportunities from biorefinery carbohydrates were used as guidelines to specify quantitative metrics for our screening protocol. Because biobased-chemical production is challenged by an overabundance of chemical species, this work is limited to screening of C6 species. Note that the threshold values related to structure, properties, reactive moieties, reactivity, reactions, and derived products are specific for the identification of C6 bioprivileged molecules. The framework can be applied to identify bioprivileged molecules with a different carbon number, but the threshold values used would vary.

the maximum number of steps that a molecule is away from the initial reactant, or the product rank (e.g., primary products are first rank, second products are second rank, etc.), of the species. In this way, large molecules generated or higher rank products are not allowed to undergo further reactions. TAL,28,43 an example bioprivileged molecule, was used here to demonstrate the network generation feature of NetGen. D

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

Figure 1. Example reaction network generated for TAL (arrows connect reactants with various products derived from them, with the specific reaction type leading to the formation of each product given below the products; species highlighted in red are molecules with more than six carbon atoms and are not allowed to participate in further reactions; species labeled with blue boxes are species known to PubChem).



SCREENING BASED ON REACTIVE MOIETIES AND REACTIVITY OF A MOLECULE Development of Reactive Moiety Matrix. There are 29252 C6HxOy species in the PubChem database (http:// pubchem.ncbi.nlm.nih.gov) as of October 9, 2017. This is an intractable number of candidate species to systematically screen by running network generation for one species at a time, which is the experimental analogue of searching for one molecule at a time and the associated pathways to produce it using synthetic biology or metabolic engineering strategies, which is a timeconsuming process. However, there are many species within the list of 29252 C6HxOy species that stood out as unattractive candidates based on their structural features such as hydrocarbons, radical and ionic species, and molecules with one functional group, suggesting an inability to be diversified. A preliminary analysis of structural groups and estimation of reactivity of species without explicitly generating all the reaction pathways can greatly reduce the number of candidate molecules for bioprivileged molecules. Thus, we developed the concept of a reactivity index to recursively quantify the number of reactive moieties derived from the reactive moieties in a molecule and its progeny. This idea builds on the concept that although biomassderived oxygenates may contain a very large number of distinct molecular components, there are only three atomic elements (C, H, and O) that constitute them and a small set of functional/ structural groups or reactive moieties such as hydroxyl, carbonyl, carboxylic acid, ether, and ester groups, double bond, and enol functionality required to summarize their representation. To

build the reactivity indices, we developed a structural matrix that consists of 25 common reactive moieties of oxygenated compounds to characterize the structural features and reactivity of oxygenated compounds, as shown in Table 2. The 25 reactive moieties can be further divided or grouped into different functional groups such as alcohol/hydroxy groups (G1-G10 and G21), ether groups (G11 and G12), a carboxylic acid group (G13), ester groups (G14 and G15), aldehyde groups (G16 and G17), ketone groups (G18-G20 and G24), an enol/phenol group (G21), double-bond groups (G21-G23), and a diene group (G25). Identification of Reactive Moieties of Candidate Molecules. The reactive moieties are specified in terms of SMiles ARbitrary Target Specification (SMARTS) strings using RDKit, which was also used to analyze the chemical elemental composition (C, H, and O), identify the symmetry, reactive moieties, rings, aromaticity, etc. in a molecule. The input files were the SDF files of molecules, and a script was written in Python that specified the substructural patterns (i.e., the Gx definitions in Table 2), which are expressed as SMARTS strings.66−68 The reactivity indices, R0 and R1, tallied in Table 2 were calculated based on how many reactive moieties are generated when a reactive moiety, Gx, is allowed to undergo the reactions encoded by the reaction operators of Table 1. With the use of the first row for G1 as an example, R0 and R1 were calculated as follows. G1, [CH2OH]-[CH]−, is a primary alcohol group (−CH2OH) that is connected to a carbon atom which is bound E

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

Figure 2. Flowchart of identification of C6 bioprivileged molecules.

but we show below that our definition of a reactivity index of a molecule based on the reactivity indices of the reactive moieties is an excellent proxy for the results of the full network generation. Prediction of the Reactivity of a Molecule. With the number of transformations that each reactive moiety can undergo specified and a mapping of the reactive moieties that will be produced when reactive moieties are reacted in hand, we developed a quantitative model for the estimation of reactivity of a species as a function of rank, termed a reactivity index, Ritotal. Specifically, reactivity indices of a molecule for rank 0 and rank 1 can be expressed as eqs 1 and 2, respectively.

to four atoms with at least one atom being an H atom. G1 can undergo four reactions: dehydration, hydrodeoxygenation, oxidation of a primary alcohol, and esterification. Dehydration of G1 forms G22 or G23. Hydrodeoxygenation of G1 forms a CH3-C group, which is an inactive group according to the reaction operators considered in this work. Oxidation of G1 forms a G16 group, and esterification of G1 with carboxylic acids forms G14. Note that HCOOH is used as a representative carboxylic acid in the list of starting molecules and is only allowed to participate in an esterification reaction. Thus, the value of R0 for G1 is 4, and four distinct reactive moieties are formed from its reactions: G22, G23, G16, and G14. To calculate the value of R1, we sum the R0 values of the four reactive moieties that are created from G1. Specifically, R1 = 3 + 1 + 6 + 1 = 11, from the R0 values of G22, G23, G16, and G14, respectively. Using this concept, we can summarize the reactivity of molecules through the reactivity indices of the reactivity moieties that comprise them, as summarized below. A detailed description of each reactive moiety, Gx, and the reactive moieties derived from them to tally the other R0 and R1 values in Table 2 is given in the Supporting Information. In addition to the reactive moieties, the atomic composition of the molecules and the symmetry of the chemical structure were also calculated. We acknowledge that the reactive moieties in Table 2 may not cover all of the relevant reactive substructures of oxygenated compounds such as acid anhydrides, but the list could easily be augmented as desired for a specific application. Also, as detailed in the Supporting Information, the derived reactive moieties and reaction number associated with each reactive moiety are not as rigorous as a full network generation,

25

R 0total =

∑ GiR i0S

(1)

i=1

where R0total is the total number of reactions that a given starting molecule (rank or generation 0 species) can undergo, Gi is the number of the ith reactive moiety within the given starting molecule, Ri0 is the number of reactions that the ith reactive moiety can undergo as tallied in Table 2, and S is the symmetry factor, with S equal to 1 if the molecule is asymmetric, and otherwise, S equals 0.5 to address the effects of a symmetric structure on the reactivity index. R1total =

ij

yz

∑ GiR i1 + ∑ jjjjjGiR i0 ∑ GjR j0zzzzz 25

25

i=1

i=1

j k

25

j≠i

z {

(2)

where R1total is the total number of reactions that a given starting molecule (rank 0) and derived rank 1 species can undergo, Gi is F

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering Table 2. Reactive Moieties and Their Reactivity Indices, R0 and R1

Figure 3. Comparison of the calculated number of reactions based on reactivity index analysis with the reaction index obtained by NetGen for subset of 160 molecules.

the number of the ith reactive moiety, Ri0 is the number of reactions that the ith reactive moiety can undergo, and Ri1 is the

sum of the number of reactions that the ith reactive moiety and its derived reactive moieties can undergo. G

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering To validate the concept of the reactivity index as a proxy for the full reaction network, the estimated reactivity indices R0total and R1total were compared with the corresponding reactivity indices obtained by network generation for a sample subset of 160 C6 oxygenated compounds. As shown in Figure 3, the calculated reactivity indices of species give reasonable agreement with the corresponding indices from network generation for both R0 and R1. The match is clearly better for R0, as expected, as fewer reactive moieties, molecules, and bimolecular reactions are involved for R0, the combinations of which would be precisely captured by network generation. Deviations in both plots can be mainly attributed to the following factors: the minimum or maximum number of reactions is used for the derived reactive moieties with uncertainty in the precise atomic environment in which the reactive moiety resides, the double counting of the same derived moieties from different parent moieties has not been addressed, and the (a)symmetry of the derived species in rank 1 has not been taken into account. Nevertheless, the agreement in Figure 3 suggests that the proposed reactivity indices can serve as a simple yet effective screening tool to pare down an enormous list of molecules for which full network generation would be intractable or not valueadded. Application of Reactive Moieties and Reactivity to Identify Potential Bioprivileged Molecules. The first step in the logic flowchart shown in Figure 2 was to download the structures, SDF files, and corresponding SMILES strings of the 29252 C6HxOy species from PubChem (step 1, Table 3). Using the SDF files as input to RDKit, all the reactive moieties in a molecule were tallied, and the reactive indices, R0 and R1, for each molecule were calculated. The results are summarized in Sheet S1. The filtering process begins at step 2, which is to remove 1323 hydrocarbon species with the number of O atoms equal to zero. The rationale is that these species originate from hydrocarbon sources and not biology-derived chemical intermediates and, therefore, are not considered to be bioprivileged molecules. Moreover, 2281 species that have a 5-membered or 6-membered ring structure but without an O atom within the ring were filtered out of the pool of candidates in step 3. The rationale is that these species are closely associated with hydrocarbons and effectively synthesized from petrochemical building blocks by introducing oxygenated functionalities, and they are not biology-derived chemical intermediates generated from a biological feedstock like sugars, which have an O-containing ring structure. Steps 4−11 represent filtering based on the analysis of structure and reactivity and comparison of the characteristics of the candidate molecules with those of the benchmark molecules. In this work, the established bioprivileged molecules 5-HMF, TAL, and muconic acid were used as benchmark molecules, and their key structural features and reactivity indices are listed in Table S1. For example, 5-HMF has six reactive moieties, including G3, G12, G17, two G23, and G25. The six reactive moieties allow for potential transformations to a number of different products. Moreover, there are five distinct reactive moieties within the six reactive moieties of 5-HMF, and the maximum number of any one reactive moiety (G23) is two, which distinguishes almost every carbon atom as having a different atomic connectivity and environment, allowing for catalytic conversion to occur selectively. In contrast, species with only one or two reactive moieties can be a useful chemical or valuable product, but they have limited ability for conversion

Table 3. Stepwise Screening of Candidate C6HxOy Molecules As Potential Bioprivileged Molecules step number 1 2 3 4 5 6 7 8 9 10 11 12

13 14 15 16 17 18 19 20 21 22 23 24

no. of species remaining (in black) or removed (in red)

description download C6HxOy in PubChem remove hydrocarbons C6Hx remove species with 5/6-membered ring but without ring −O− remove species with total reactive moiety no. < 3 remove species with total reactive moiety no. > 6 remove species with one same reactive moiety > 3 remove species with reactivity index of rank 0 < 8 remove species with reactivity index of rank 0 > 14 remove species with reactivity index of rank 1 < 65 remove species with reactivity index of rank 1 > 150 remove radical/ionic species remove species with sum of no. of literature, patent, and vendor ≤ 5 species subjected to screening by automated network generation remove species with no. of total products < 40 remove species with no. of total products > 90 remove species with no. of NetGen rxn rank 0 ≤ 6 remove species with no. of NetGen rxn rank 1 ≥ 130 remove species with no. of rxn ≤ 50 when ΔG < 15 kcal/mol remove species with no. of rxn ≥ 110 when ΔG < 15 kcal/mol remove species with no. of known products < 10 remove species with no. of literature of products ≤ 7 remove species with ratio of no. of known to novel products ≥ 0.75 remove species with ratio of no. of patents to literature < 20 remove species with predicted toxicity index > 0.67 list of 100 C6 bioprivileged molecule candidates

29252 1323 (27929) 2281 (25649) 5,979 (19,669) 1704 (17965) 3,797 (14,168) 5429 (8739) 2657 (6082) 932 (5,150) 1019 (4131) 163 (3968) 2935 (1033) 1033 (-) 205 (828) 185 (643) 1 (642) 13 (629) 4 (625) 37 (588) 34 (554) 57 (497) 314 (183) 45 (138) 38 (100) 100

into a range of products. While species with too many reactive moieties like glucose, which has seven reactive moieties, allow for a variety of transformations to a diversity of products, the selective cleavage of one of the six nearly identical C−OH bonds in glucose is a difficult problem for chemical catalysis. Overall, we emphasized that a preferred candidate as a bioprivileged molecule should have enough reactive moieties and reactivity to allow for diversity in its product slate but not excess functionality that would lead to selectivity challenges for chemical catalysis. Therefore, step 4 addresses the removal of 5979 C6HxOy species that have a total number of reactive moieties less than three. In this work, the maximum number of reactive moieties for a C6 candidate molecule was six, and 1704 C6 species with more than six reactive moieties were filtered in step 5. Moreover, no molecules with more than three of the same reactive moiety or functional groups within a candidate molecule were allowed to H

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

candidate molecule can undergo and the associated reaction types, reaction pathways, and the product slate that is formed are explicitly provided. 5-HMF, TAL, and muconic acid are again used as benchmark molecules to identify bioprivileged molecules according to measures associated with the detailed reaction network. As listed in Table S1, the number of distinct products derived from 5-HMF, TAL, and muconic acid is 84, 41, and 81, respectively. Therefore, the lower and upper bounds for the number of derived products were set to be 40 and 90, respectively. Step 13 removes 205 species with limited diversity in the product slate, and step 14 removes 185 species for forming too many products. As a further refinement, bounds on the reactivity indices R0 and R1, based on the explicit network generation results that were tallied, were used to remove 1 and 13 species in steps 15 and 16, respectively. While the reaction operators are all defined based on common chemical transformations, the facility with which the reactions can be carried out would be highly dependent on catalyst design, reactor conditions, and reactor engineering and separation challenges, which suggests that kinetic screening would be impractical at this stage, particular for novel reactions. However, thermodynamics are straightforward to quantify and can be used to screen out reactions that would be too far uphill at typical conditions to be practical. The free energy of reaction, ΔG, was calculated using a group contribution method. In this work, a threshold value of 15 kcal/mol was used for ΔG at 293.15 K to define thermodynamically unfavorable reactions. NetGen reactivity indices were then recalculated in light of reactions removed by this criteria, and the set of molecules was then filtered to remove candidate C6 species in steps 17 and 18. Another key attribute of a bioprivileged molecule is that the diversity of chemical products derived from bioprivileged molecules includes both novel molecules and drop-in replacements. In this work, all the products generated from a C6 starting molecule were subjected to an inquiry in PubChem. We define products that have already been registered in PubChem as a product known to PubChem, and one that is not registered is labeled as a product novel to PubChem. For the products known to PubChem, the number of literature reports, the number of patents, and the number of vendors associated with each product were obtained from PubChem. Then, the sum of the number of literature reports, the sum of the number of patents, and the sum of the number of vendors for all the known products derived from a candidate molecule were calculated and used to roughly reflect the attractiveness of the product slate. It was then required that a bioprivileged molecule have at least 10 products known to PubChem, and the summation of the number of literature reports for all the known products must be larger than seven, a lower bound inspired by the benchmark molecules. C6 species that did not meet this criteria were removed by step 19 and step 20, respectively. An important conceptual evolution of bioprivileged molecules from biobased platform chemicals is the objective that a bioprivileged molecule provides a basis for diversity-oriented synthesis leading to novel bioproducts.28 In this work, an index, which is defined as the ratio of the number of products known to PubChem to the number of products novel to PubChem derived from a given starting molecule, referred to here as “known/novel index”, was introduced to characterize the potential of a bioprivileged molecule to make novel bioproducts. As shown in Table S1, the values of the known/novel index for 5-HMF, TAL, and muconic acid are all less than 0.7, and therefore, a molecule with a known/novel index of ≥0.75 was removed,

advance to the next step, which promotes the possibility that each type of chemical reaction can be achieved in a selective manner. 3797 species were filtered according to this criterion in step 6. The concept that a bioprivileged molecule must have enough reactivity to allow for transformation to a number of valuable chemicals was then further honed by applying the values of the reactivity indices, which is a more precise assessment of the potential for diversity, yet selectivity. The reactivity indices of three established bioprivileged molecules, 5-HMF, TAL and muconic acid, range from 8 to 13 for R0 and 69 to 147 for R1. Therefore, ranges from 8 to 14 for R0 and 65 to 150 for R1 were set to screen candidate bioprivileged molecules. As shown in Table 3, steps 7 and 9 remove C6 species that allow fewer than eight transformations for rank 0 and fewer than 65 transformations for rank 1, respectively. Steps 8 and 10 address the excess reactivity that may cause problems in selective chemical transformation. In accordance with this metric, 2657 species were removed by step 8, and 1019 species were removed by step 10. At this stage, there are 4131 candidate bioprivileged molecules remaining. The next step was to use the PubChem database to characterize additional features of these molecules. In particular, the category of the species and the numbers of literature, patents, and vendors for each molecule were assessed. The results are provided in Sheet S1. Step 11 addresses the removal of radical species, ionic species, and complexes with crystal structures or solvents, which account for 163 species. Note that the numbers of literature, patents, and vendors for each molecule are only based on the records of PubChem, which do not necessarily represent the completed records for literature, patents, and vendors associated with the specific molecules. In this work, the sum of the numbers of literature, patents, and vendors was used as a rough index, named LPV index, to reflect the attention received in the literature regarding the molecule through articles and citations, primarily from academic sources, the potential of the species in practical and industrial application as measured by patents, and the ability of the species to be purchased and the commercial interest of the species to chemical vendors or suppliers. If a molecule has a small value of the LPV index, we assert that the molecule is not attractive as a candidate bioprivileged molecule. In this work, species with an LPV index less than or equal to five were filtered in step 12. We acknowledge that the threshold five for the LPV index is arbitrary and that this step may remove a species that might be an attractive candidate as a bioprivileged molecule since it may simply be an unrecognized opportunity, which one could argue would be an interesting list to tally. Overall, the size of the C6 candidate list was reduced from 29252 to 1033 by applying the analysis of reactive moieties and reactivity of species. We note that the framework that we developed can be easily tailored to different quantitative values of the criteria applied here or even the incorporation of other screening metrics. Application of NetGen to Identify Potential Bioprivileged Molecules. The remaining 1033 C6HxOy species were individually subjected to further screening using automated network generation to evaluate each species’ derived products and reaction network, which are summarized in Sheet S1. Emphasizing that bioprivileged molecules must allow for transformation to a diversity of chemical products, the results of automated network generation are particularly germane, as detailed information about every possible transformation a I

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

Figure 4. 100 bioprivileged molecule candidates. The numbers in the figure are PubChem compound identification numbers (CID), and each number represents a unique molecule.

important criterion for screening candidate bioprivileged molecules. The Toxicity Estimation Software Tool (TEST)70−77 developed by the United States Environmental Protection Agency, which estimates the toxicity values of organic chemicals based on molecular structure using Quantitative Structure−Activity Relationships (QSARs), was employed for estimation of toxicity of species. In this work, a list of SMILES strings of species which obtained by translation of SDF files using the PubChem Download Service, and the developmental toxicity values of species were calculated by TEST using the consensus method. As shown in Table S1, the toxicity values of 5-HMF, TAL, and muconic acid are 0.11, 0.65, and 0.30, respectively. Therefore, 38 species with a predicted toxicity value of >0.67 were removed by step 23. Overall, the application of the screening criteria in steps 2−23 gives a list of 100 C6 bioprivileged molecule candidates, which are summarized in Figure 4, according to PubChem compound identification number (CID), with each number representing a unique molecule. For example, CID values of 310, 237332, and 54675757 represent muconic acid, 5-HMF, and TAL,

resulting in the elimination of 314 C6HxOy species in step 21. Moreover, we defined another index, which is the ratio of the summation of the number of patents related to all derived products to the summation of the number of literature reports related to all derived products, referred to as the “patents/ literature index”, to characterize the potential of the products known to PubChem in practical and industrial applications versus the attention received. A lower value of a patents/ literature index suggests that the derived products of a candidate C6HxOy molecule have been reasonably well-studied in the literature, but they may have little potential in practical and industrial applications on a relative basis. Step 22 removes 45 C6HxOy species that give a patents/literature index of less than 20. Due to the biological origin of bioprivileged molecules or biobased platform chemicals, there is often a presumption of being “green” and thus having low toxicity, but Ventura et al.69 pointed out toxicity often prevents biomass-derived oxygenates such as furans and their derivatives from serving as platform chemicals. In this work, we introduced the toxicity index as an J

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

Figure 5. Nine broad compound classes into which the 100 bioprivileged molecule candidates fall (left) and number of appearances of reactive moieties in the list of 100 bioprivileged molecule candidates (right). The color coding in the image on the left carries through to Figure S2, in which the structures of all 100 molecules and the compound class into which they fall are shown.

Figure 6. Example reaction network generated for coumalic acid (arrows represent connections between reactants and products, specific reaction types leading to the formation of each product are given below the products, reaction in red means the ΔG of this reaction step is higher than 15 kcal/mol and therefore is not considered as a thermodynamically feasible reaction step, species highlighted in red are molecules with more than six carbon atoms and are not allowed to participate in further reactions, and species labeled with blue boxes are species known to PubChem).

respectively. Sensitivity analysis was also carried out to examine how the choices of the particular cutoff values affected the final set of molecules, and as summarized in Tables S2 and S3, varying the parameters within a reasonable range does not dramatically impact the composition of the final list of C6HxOy bioprivileged molecule candidates. Analysis of the Bioprivileged Structures of the Bioprivileged Molecules. The structures of the 100 C6 bioprivileged molecule candidates fall into one of nine broad compound classes summarized in Figure 5 and are provided in Figure S2 (grouped according to nine compound classes) and Figure S3 (individually). Analysis of the structural features and groups of 100 C6 bioprivileged molecule candidates was performed to deepen the understanding of bioprivileged structures and functionalities. Table S4 summarizes the

appearance of reactive moieties and functional groups within the 100 molecules. As shown in Table S4, the most common reactive moiety contained within the candidate bioprivileged molecules is G12, which is the ring [O] group that can be easily preserved from biomass carbohydrates and/or carbohydratederived compounds. Carboxylic acid group G13 also appears frequently, with 53 occurrences within the 100 molecules. Moreover, G22, G23, and G13 are the three most preferred reactive moieties that a bioprivileged molecule has. Like G22 and G23, diene group G25 is also a favorable substructure of a bioprivileged molecule. Hydroxyl reactive moieties G1 and G4, ether group G11, and aldehyde G16 have average levels of appearance in the 100 molecules, while no bioprivileged molecule contains any instance of G2, G6, or G9, indicating that those groups are not preferred within a bioprivileged K

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

ACS Sustainable Chemistry & Engineering



molecule. As shown in Table S4, all the bioprivileged molecule candidates are asymmetric, except three molecules, including muconic acid. There are 47 molecules with a 5/6-membered oxygen heterocycle. Overall, this analysis suggests that bioprivileged molecules have higher preferences for ether groups within a ring and double bonds, especially in the form of dienes, than for hydroxyl groups or carboxylic acid groups, which are in turn more favorable than carbonyl groups and ester groups. This provides informative guidance for design of target bioprivileged molecules. Interestingly, there are 29 species in the list of 100 bioprivileged molecule candidates that have zero as the number of vendors, meaning these compounds are not purchasable according to PubChem at this time. This could be due to factors such as challenging synthesis or high cost, but it may also be that no study before the present one identified their potential. Given the diversity of products that these candidate molecules can generate via a range of transformations, it would be very interesting to explore the synthesis of these compounds, both as interesting synthetic challenges and as candidates for practical applications. Case Study of Coumalic Acid As a Candidate Bioprivileged Molecule. Coumalic acid, PubChem CID 68141, is one of the 100 species which was identified as bioprivileged in this work. Coumalic acid has five reactive moieties in total and five distinct reactive moieties, [COOH] (G13), ring −[COO]− (G15), CC without O (G22), CC with O (G23), and CC−CC (G25). Thus, it has traits that are common in bioprivileged molecules, in that each of the carbon atoms has different chemical environments and, as a result, distinct reactivity patterns. The reactivity indices for rank 0 and rank 1 are 10 and 115, respectively, which match well with the NetGen reactivity indices R0 equal to 10 and R1 equal to 97. As shown in Figure 6, automated network generation creates a detailed reaction network leading to the formation of 78 products via one-step reactions or two-step pathways. Among the products, there are 17 species that have been documented in PubChem, and the remainder are novel to PubChem. The majority of the known products in Figure 6 are derived from coumalic acid through decarboxylation and esterification of G13, hydrolysis of G15, and hydrogenation of G22 and G23. For example, esterification of coumalic acid with methanol in H2SO4 to produce methyl coumalate is a well-studied reaction.78−81 Esterification of coumalic acid with phenols or glycols such as mmethoxyphenol, o-methoxyphenol, triethylene glycol, and pentamethylene glycol have also been reported.81 Methyl coumalate is not allowed to undergo further reactions since it is a C7 species, but it has been widely used as a reagent in phosphine-catalyzed annulation of modified allylic carbonates and in the preparation of 7-carboxyquinolizinium derivatives. Moreover, Kraus et al.78 reported that methyl coumalate could be used in several inverse electron demand Diels−Alder reactions to generate terphenyls with value in the areas of organic light-emitting diodes and molecular electronics. However, note that coumalic acid can readily undergo Diels− Alder reactions of G25 for the production of a variety of novel bicyclic lactones. Kraus and co-workers79 have demonstrated that disubstituted aromatic compounds such as terephthalic acid can be produced in good selectivity and yield from Diels−Alder reactions of coumalic acid and with inactivated alkenes.

Research Article

SUMMARY

Bioprivileged molecules are emerging as a useful new paradigm for developing biobased chemicals. However, the identification of bioprivileged molecules has relied on extensive experimental work over a long period of time. In this work, to the best of our knowledge, the first computational platform for systematically screening and identifying potential bioprivileged molecules was developed. Specifically, a structural matrix consisting of 25 reactive moieties was developed to characterize the structural features of oxygenates, and a summation metric based on the structural features that can be created from a given reactive moiety based on a set of defined chemical transformations was defined. Application of the proposed computational platform for identification of 100 bioprivileged molecule candidates from 29252 C6 compounds in PubChem involved 24 steps in total. The number of C6 species was reduced from 29252 to 1033 by analysis of structural features, tallying the reactive moieties, and evaluating reactivity of species using the simple reactivity index that we developed. Automated network generation, which is able to generate every possible compound and reaction from a set of starting compounds and reaction rules, creating a large reaction network that includes known products and novel compounds, was then carried out to screen the list of molecules further. Reactions were encoded as reaction rules of typical catalytic reaction families applicable in transforming biomass-derived oxygenates. The case study of TAL demonstrated that automated network generation is able to generate both reaction pathways reported in the literature and novel pathways for transformation of TAL to various novel products. The 100 bioprivileged molecule candidates were obtained by further application of automated network generation to evaluate the remaining 1033 C6 species based on the attractiveness of the starting molecules, transformation pathways, and derived products, including tallying the number of products known to PubChem and species novel to PubChem. Sensitivity analysis results showed that the parameters used in the filtering steps for identification of C6 bioprivileged molecules do have an impact on the number of species being removed in each step. However, varying the parameters within a reasonable range does not dramatically impact the composition of the final list of C6HxOy bioprivileged molecule candidates. The computational platform for the identification of C6HxOy bioprivileged molecules based on analysis of reactive moieties and reactivity in both a general and detailed sense is readily applicable to the identification of bioprivileged molecules with a different carbon number. Moreover, the analysis of reactive moieties and reactivity can be used to quantitatively describe and predict structure−reactivity relationships but also can be applied to guide the design of bioprivileged molecules. Analysis of the C6HxOy bioprivileged molecules creates a paradigm for establishing a rational structure for the design of bioprivileged molecules in general. Using this framework to distill potential C6HxOy molecules to a total of 100 potential candidates, a more tractable set is available to assess the product molecules that can be readily accessed from the bioprivileged molecules. The framework also can be run in reverse if a specific final product were desired in order to find the intermediates that are candidates for its precursor, which could include the bioprivileged molecules already identified. Assessing the present list of 100 candidate molecules, which is the subject of ongoing work, will allow further refinement of the constraints used to establish this initial L

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering

Monofunctional Hydrocarbons and Targeted Liquid-Fuel Classes. Science 2008, 322 (5900), 417−421. (3) Vispute, T. P.; Zhang, H.; Sanna, A.; Xiao, R.; Huber, G. W. Renewable Chemical Commodity Feedstocks from Integrated Catalytic Processing of Pyrolysis Oils. Science 2010, 330 (6008), 1222−1227. (4) Mettler, M. S.; Vlachos, D. G.; Dauenhauer, P. J. Top Ten Fundamental Challenges of Biomass Pyrolysis for Biofuels. Energy Environ. Sci. 2012, 5 (7), 7797−7809. (5) Chundawat, S. P. S.; Beckham, G. T.; Himmel, M. E.; Dale, B. E. Deconstruction of Lignocellulosic Biomass to Fuels and Chemicals. Annu. Rev. Chem. Biomol. Eng. 2011, 2, 121−145. (6) Mäki-Arvela, P.; Simakova, I. L.; Salmi, T.; Murzin, D. Y. Production of Lactic Acid/Lactates from Biomass and Their Catalytic Transformations to Commodities. Chem. Rev. 2014, 114 (3), 1909− 1971. (7) Gao, C.; Ma, C.; Xu, P. Biotechnological routes based on lactic acid production from biomass. Biotechnol. Adv. 2011, 29 (6), 930−939. (8) Dodds, D. R.; Gross, R. A. Chemicals from Biomass. Science 2007, 318 (5854), 1250−1251. (9) Bozell, J. J.; Petersen, G. R. Technology development for the production of biobased products from biorefinery carbohydrates-the US Department of Energy’s ″Top 10″ revisited. Green Chem. 2010, 12 (4), 539−554. (10) Choi, S.; Song, C.; Shin, J.; Lee, S. Biorefineries for the production of top building block chemicals and their derivatives. Metab. Eng. 2015, 28, 223−239. (11) Fiorentino, G.; Ripa, M.; Ulgiati, S. Chemicals from biomass: technological versus environmental feasibility. A review. Biofuels, Bioprod. Biorefin. 2017, 11 (1), 195−214. (12) Jang, Y. S.; Kim, B.; Shin, J. H.; Choi, Y. J.; Choi, S.; Song, C. W.; Lee, J.; Park, H. G.; Lee, S. Y. Bio-based production of C2−C6 platform chemicals. Biotechnol. Bioeng. 2012, 109 (10), 2437−2459. (13) Werpy, T.; Petersen, G. Top Value Added Chemicals from Biomass Volume IResults of Screening for Potential Candidates from Sugars and Synthesis Gas; US Department of Energy, Washington DC, 2004. (14) Schwartz, T. J.; O’Neill, B. J.; Shanks, B. H.; Dumesic, J. A. Bridging the Chemical and Biological Catalysis Gap: Challenges and Outlooks for Producing Sustainable Chemicals. ACS Catal. 2014, 4 (6), 2060−2069. (15) Saha, B.; Abu-Omar, M. M. Advances in 5-hydroxymethylfurfural production from biomass in biphasic solvents. Green Chem. 2014, 16 (1), 24−38. (16) Chheda, J. N.; Roman-Leshkov, Y.; Dumesic, J. A. Production of 5-hydroxymethylfurfural and furfural by dehydration of biomassderived mono- and poly-saccharides. Green Chem. 2007, 9 (4), 342− 350. (17) Salak Asghari, F.; Yoshida, H. Acid-Catalyzed Production of 5Hydroxymethyl Furfural from d-Fructose in Subcritical Water. Ind. Eng. Chem. Res. 2006, 45 (7), 2163−2173. (18) Román-Leshkov, Y.; Barrett, C. J.; Liu, Z. Y.; Dumesic, J. A. Production of dimethylfuran for liquid fuels from biomass-derived carbohydrates. Nature 2007, 447, 982. (19) Zhang, Y.; Zhang, J.; Su, D. 5-Hydroxymethylfurfural: A key intermediate for efficient biomass conversion. J. Energy Chem. 2015, 24 (5), 548−551. (20) Rosatella, A. A.; Simeonov, S. P.; Frade, R. F. M.; Afonso, C. A. M. 5-Hydroxymethylfurfural (HMF) as a building block platform: Biological properties, synthesis and synthetic applications. Green Chem. 2011, 13 (4), 754−793. (21) Alonso, D. M.; Wettstein, S. G.; Dumesic, J. A. Gammavalerolactone, a sustainable platform molecule derived from lignocellulosic biomass. Green Chem. 2013, 15 (3), 584−595. (22) Mellmer, M. A.; Martin Alonso, D.; Luterbacher, J. S.; Gallo, J. M. R.; Dumesic, J. A. Effects of [gamma]-valerolactone in hydrolysis of lignocellulosic biomass to monosaccharides. Green Chem. 2014, 16 (11), 4659−4662. (23) Palkovits, R. Pentenoic Acid Pathways for Cellulosic Biofuels. Angew. Chem., Int. Ed. 2010, 49 (26), 4336−4338.

set. With that refinement, it will also be possible to extend the work to other biological candidates, including C4s and C5s. The assessment will also involve connecting the identified C6 candidates with a framework such as BNICE52−54 to further elucidate which of the bioprivileged molecules seem reasonable as biological targets.



ASSOCIATED CONTENT

S Supporting Information *

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acssuschemeng.8b05275. Reaction families, reaction operators, and example reaction matrix; reaction moieties and their reactivity indices, R0 and R1; benchmark bioprivileged molecules; classes of C6 bioprivileged molecule candidates; structural analysis of C6 bioprivileged molecules, and sensitivity analysis (PDF) Reactive moiety and reactivity values (XLSX)



AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected]. Tel: +1-847-467-1751. Fax: +1-847-491-3728. ORCID

Zachary J. Brentzel: 0000-0001-8577-2963 George A. Kraus: 0000-0002-3037-8413 James A. Dumesic: 0000-0001-6542-0856 Brent H. Shanks: 0000-0002-1805-415X Linda J. Broadbelt: 0000-0003-4253-592X Author Contributions

X.Z., Z.J.B., G.A.K., P.L.K., J.A.D., B.H.S., and L.J.B. designed the research, X.Z. performed the research, and X.Z. and L.J.B. wrote the paper. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS The approach advocated in this opinion is reflected in the core mission of the National Science Foundation Engineering Research Center for Biorenewable Chemicals (CBiRC). CBiRC is funded by the National Science Foundation under Award EEC-0813570. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the authors and do not necessarily represent the views of the National Science Foundation. The authors thank Dr. Lindsay Oakley, Dr. Rob Brydon, and Dr. James Jeffryes at Northwestern University, Pradeep Natarajan at Indian Institute of Technology Madras, Professor Matthew Neurock at the University of Minnesota, Professor Bob Davis at the University of Virginia, Dr. Wolf-Dietrich Ihlenfeldt at Xemistry GmbH in Germany, and Drs. Paul A. Thiessen and Bo Yu at NCBI/NLM/NIH for fruitful discussions and useful suggestions.



REFERENCES

(1) Carpenter, D.; Westover, T. L.; Czernik, S.; Jablonski, W. Biomass Feedstocks for Renewable Fuel Production: A Review of the Impacts of Feedstock and Pretreatment on the Yield and Product Distribution of Fast Pyrolysis Bio-oils and Vapors. Green Chem. 2014, 16 (2), 384−406. (2) Kunkes, E. L.; Simonetti, D. A.; West, R. M.; Serrano-Ruiz, J. C.; Gärtner, C. A.; Dumesic, J. A. Catalytic Conversion of Biomass to M

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering (24) Wright, W. R. H.; Palkovits, R. Development of Heterogeneous Catalysts for the Conversion of Levulinic Acid to γ-Valerolactone. ChemSusChem 2012, 5 (9), 1657−1667. (25) Luterbacher, J. S.; Rand, J. M.; Alonso, D. M.; Han, J.; Youngquist, J. T.; Maravelias, C. T.; Pfleger, B. F.; Dumesic, J. A. Nonenzymatic Sugar Production from Biomass Using Biomass-Derived γ-Valerolactone. Science 2014, 343 (6168), 277−280. (26) Bond, J. Q.; Alonso, D. M.; Wang, D.; West, R. M.; Dumesic, J. A. Integrated Catalytic Conversion of γ-Valerolactone to Liquid Alkenes for Transportation Fuels. Science 2010, 327 (5969), 1110−1114. (27) Schwartz, T. J.; Shanks, B. H.; Dumesic, J. A. Coupling chemical and biological catalysis: a flexible paradigm for producing biobased chemicals. Curr. Opin. Biotechnol. 2016, 38, 54−62. (28) Shanks, B. H.; Keeling, P. L. Bioprivileged molecules: creating value from biomass. Green Chem. 2017, 19 (14), 3177−3185. (29) Xie, D.; Shao, Z.; Achkar, J.; Zha, W.; Frost, J. W.; Zhao, H. Microbial synthesis of triacetic acid lactone. Biotechnol. Bioeng. 2006, 93 (4), 727−736. (30) Cardenas, J.; Da Silva, N. Metabolic engineering of Saccharomyces cerevisiae for the production of triacetic acid lactone (TAL). Abstr. Pap. Am. Chem. Soc 2013, 245. (31) Tang, S. Y.; Qian, S.; Akinterinwa, O.; Frei, C. S.; Gredell, J. A.; Cirino, P. C. Screening for Enhanced Triacetic Acid Lactone Production by Recombinant Escherichia coli Expressing a Designed Triacetic Acid Lactone Reporter. J. Am. Chem. Soc. 2013, 135 (27), 10099−10103. (32) Cardenas, J.; Da Silva, N. A. Metabolic engineering of Saccharomyces cerevisiae for the production of triacetic acid lactone. Metab. Eng. 2014, 25, 194−203. (33) Saunders, L. P.; Bowman, M. J.; Mertens, J. A.; Da Silva, N. A.; Hector, R. E. Triacetic acid lactone production in industrial Saccharomyces yeast strains. J. Ind. Microbiol. Biotechnol. 2015, 42 (5), 711−721. (34) Markham, K. A.; Palmer, C. M.; Chwatko, M.; Wagner, J. M.; Murray, C.; Vazquez, S.; Swaminathan, A.; Chakravarty, I.; Lynd, N. A.; Alper, H. S. Rewiring Yarrowia lipolytica toward triacetic acid lactone for materials generation. Proc. Natl. Acad. Sci. U. S. A. 2018, 115 (9), 2096−2101. (35) Kraus, G. A.; Wanninayake, U. K. An improved aldol protocol for the preparation of 6-styrenylpyrones. Tetrahedron Lett. 2015, 56 (51), 7112−7114. (36) Bedford, C. T.; Money, T. Photochemistry of 4-Hydroxy-6Methyl-(2h)-Pyran-2-One (Triacetic Acid Lactone). J. Chem. Soc. D 1969, 0 (13), 685−686. (37) Evans, G. E.; Leeper, F. J.; Murphy, J. A.; Staunton, J. Triacetic Acid Lactone as a Polyketide Synthon - Synthesis of Toralactone and Polyketide-Type Anthracene-Derivatives. J. Chem. Soc., Chem. Commun. 1979, 5, 205−206. (38) Bacardit, R.; Morenomanas, M. Hydrogenations of Triacetic Acid Lactone - New Synthesis of the Carpenter Bee (XylocopaHirsutissima) Sex-Pheromone. Tetrahedron Lett. 1980, 21 (6), 551− 554. (39) Moreno-Mañas, M.; Pleixats, R.; de March, P.; Ripoll, I. Alkylations at C-3 and C-5 of Triacetic Acid Lactone and Derivatives. Heterocycles 1984, 21, 766−766. (40) Demarch, P.; Morenomanas, M.; Roca, J. L.; Germain, G.; Piniella, J. F.; Dideberg, O. Reactions of Triacetic Acid Lactone with Carbonyl-Compounds - X-Ray Structure Determination of 3Acetoacetyl-2-Chromenone and 3,6,9,12-Tetramethyl-1h,6h,7h,12h6,12-Methanodipyrano[4,3-B-4,3-F]-Dioxocin-1,7-Dione. J. Heterocycl. Chem. 1986, 23 (5), 1511−1513. (41) Morenomanas, M.; Pleixats, R. Dehydroacetic Acid, Triacetic Acid Lactone, and Related Pyrones. Adv. Heterocycl. Chem. 1992, 53, 1− 84. (42) Younis, Y. M.; Al-Shihry, S. S. Triacetic acid lactone methyl ether as a natural products synthon. Aust. J. Chem. 2000, 53 (7), 589−591. (43) Chia, M.; Schwartz, T. J.; Shanks, B. H.; Dumesic, J. A. Triacetic acid lactone as a potential biorenewable platform chemical. Green Chem. 2012, 14 (7), 1850−1853.

(44) Kraus, G. A.; Basemann, K.; Guney, T. Selective pyrone functionalization: reductive alkylation of triacetic acid lactone. Tetrahedron Lett. 2015, 56 (23), 3494−3496. (45) Benferrah, N.; Hammadi, M.; Berthiol, F. Application of Microwave Heating of Dry Organic Reactions: New Condensation Products from Triacetic Acid Lactone (Tal). Rev. Roum. Chim. 2016, 61 (11−12), 863−869. (46) Kraus, G. A.; Wanninayake, U. K.; Bottoms, J. Triacetic acid lactone as a common intermediate for the synthesis of 4-hydroxy-2pyridones and 4-amino-2-pyrones. Tetrahedron Lett. 2016, 57 (11), 1293−1295. (47) Chia, M.; Haider, M. A.; Pollock, G.; Kraus, G. A.; Neurock, M.; Dumesic, J. A. Mechanistic Insights into Ring-Opening and Decarboxylation of 2-Pyrones in Liquid Water and Tetrahydrofuran. J. Am. Chem. Soc. 2013, 135 (15), 5699−5708. (48) Broadbelt, L. J.; Stark, S. M.; Klein, M. T. Computer Generated Pyrolysis Modeling: On-the-Fly Generation of Species, Reactions, and Rates. Ind. Eng. Chem. Res. 1994, 33 (4), 790−799. (49) Finley, S. D.; Broadbelt, L. J.; Hatzimanikatis, V. Computational Framework for Predictive Biodegradation. Biotechnol. Bioeng. 2009, 104 (6), 1086−1097. (50) Finley, S. D.; Broadbelt, L. J.; Hatzimanikatis, V. In silico feasibility of novel biodegradation pathways for 1,2,4-trichlorobenzene. BMC Syst. Biol. 2010, 4 (1), 7. (51) González-Lergier, J.; Broadbelt, L. J.; Hatzimanikatis, V. Theoretical Considerations and Computational Analysis of the Complexity in Polyketide Synthesis Pathways. J. Am. Chem. Soc. 2005, 127 (27), 9930−9938. (52) Pertusi, D. A.; Stine, A. E.; Broadbelt, L. J.; Tyo, K. E. J. Efficient searching and annotation of metabolic networks using chemical similarity. Bioinformatics 2015, 31 (7), 1016−1024. (53) Stine, A.; Zhang, M.; Ro, S.; Clendennen, S.; Shelton, M. C.; Tyo, K. E. J.; Broadbelt, L. J. Exploring De Novo metabolic pathways from pyruvate to propionic acid. Biotechnol. Prog. 2016, 32 (2), 303−311. (54) Henry, C. S.; Broadbelt, L. J.; Hatzimanikatis, V. Discovery and analysis of novel metabolic pathways for the biosynthesis of industrial chemicals: 3-hydroxypropanoate. Biotechnol. Bioeng. 2010, 106 (3), 462−473. (55) Chheda, J. N.; Huber, G. W.; Dumesic, J. A. Liquid-phase catalytic processing of biomass-derived oxygenated hydrocarbons to fuels and chemicals. Angew. Chem., Int. Ed. 2007, 46 (38), 7164−7183. (56) Alonso, D. M.; Bond, J. Q.; Dumesic, J. A. Catalytic conversion of biomass to biofuels. Green Chem. 2010, 12 (9), 1493−1513. (57) Serrano-Ruiz, J. C.; West, R. M.; Dumesic, J. A. Catalytic Conversion of Renewable Biomass Resources to Fuels and Chemicals. Annu. Rev. Chem. Biomol. Eng. 2010, 1, 79−100. (58) Mortensen, P. M.; Grunwaldt, J. D.; Jensen, P. A.; Knudsen, K. G.; Jensen, A. D. A review of catalytic upgrading of bio-oil to engine fuels. Appl. Catal., A 2011, 407 (1−2), 1−19. (59) Alonso, D. M.; Wettstein, S. G.; Dumesic, J. A. Bimetallic catalysts for upgrading of biomass to fuels and chemicals. Chem. Soc. Rev. 2012, 41 (24), 8075−8098. (60) Boucher-Jacobs, C.; Nicholas, K. M., Deoxydehydration of Polyols. In Selective Catalysis for Renewable Feedstocks and Chemicals, Nicholas, K. M., Ed. Springer International Publishing: Cham, 2014; pp 163−184. (61) Gossett, J.; Srivastava, R. Rhenium-catalyzed deoxydehydration of renewable biomass using sacrificial alcohol as reductant. Tetrahedron Lett. 2017, 58 (39), 3760−3763. (62) Kwok, K. M.; Choong, C. K. S.; Ong, D. S. W.; Ng, J. C. Q.; Gwie, C. G.; Chen, L.; Borgna, A. Hydrogen-Free Gas-Phase Deoxydehydration of 2,3-Butanediol to Butene on Silica-Supported Vanadium Catalysts. ChemCatChem 2017, 9 (13), 2443−2447. (63) Liu, S.; Yi, J.; Abu-Omar, M. M., Deoxydehydration (DODH) of Biomass-Derived Molecules. In Reaction Pathways and Mechanisms in Thermocatalytic Biomass Conversion II Homogeneously Catalyzed Transformations, Acrylics from Biomass, Theoretical Aspects, Lignin Valorization and Pyrolysis Pathways:, Schlaf, M.; Zhang, Z. C., Eds. Springer Singapore: Singapore, 2016; pp 1−11. N

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX

Research Article

ACS Sustainable Chemistry & Engineering (64) Petersen, A. R.; Nielsen, L. B.; Dethlefsen, J. R.; Fristrup, P. Vanadium-Catalyzed Deoxydehydration of Glycerol Without an External Reductant. ChemCatChem 2018, 10 (4), 769−778. (65) Linger, J. G.; Vardon, D. R.; Guarnieri, M. T.; Karp, E. M.; Hunsinger, G. B.; Franden, M. A.; Johnson, C. W.; Chupka, G.; Strathmann, T. J.; Pienkos, P. T.; Beckham, G. T. Lignin valorization through integrated biological funneling and chemical catalysis. Proc. Natl. Acad. Sci. U. S. A. 2014, 111 (33), 12013−12018. (66) SMARTS Theory Manual; Daylight Chemical Information Systems, Santa Fe, NM. (67) SMARTS Tutorial; Daylight Chemical Information Systems, Santa Fe, NM. (68) SMARTS Examples; Daylight Chemical Information Systems, Santa Fe, NM. (69) Ventura, S. P. M.; de Morais, P.; Coelho, J. A. S.; Sintra, T.; Coutinho, J. A. P.; Afonso, C. A. M. Evaluating the toxicity of biomass derived platform chemicals. Green Chem. 2016, 18 (17), 4733−4742. (70) Martin, T. Toxicity Estimation Software Tool (TEST). U.S. Environmental Protection Agency: Washington, DC, 2016. (71) EPA, U. S. User’s Guide for T.E.S.T. (version 4.2) (Toxicity Estimation Software Tool): A Program to Estimate Toxicity from Molecular Structure. 2016. (72) Benfenati, E.; Benigni, R.; Demarini, D. M.; Helma, C.; Kirkland, D.; Martin, T. M.; Mazzatorta, P.; Ouedraogo-Arras, G.; Richard, A. M.; Schilter, B.; Schoonen, W. G.; Snyder, R. D.; Yang, C. Predictive models for carcinogenicity and mutagenicity: frameworks, state-of-the-art, and perspectives. J. Environ. Sci. Health. C Environ. Carcinog. Ecotoxicol. Rev. 2009, 27 (2), 57−90. (73) Cassano, A.; Manganaro, A.; Martin, T.; Young, D.; Piclin, N.; Pintore, M.; Bigoni, D.; Benfenati, E. CAESAR models for developmental toxicity. Chem. Cent. J. 2010, 4, S4. (74) Martin, T. M.; Harten, P.; Venkatapathy, R.; Das, S.; Young, D. M. A hierarchical clustering methodology for the estimation of toxicity. Toxicol. Mech. Methods 2008, 18 (2−3), 251−266. (75) Sushko, I.; Novotarskyi, S.; Korner, R.; Pandey, A. K.; Cherkasov, A.; Li, J. Z.; Gramatica, P.; Hansen, K.; Schroeter, T.; Muller, K. R.; Xi, L. L.; Liu, H. X.; Yao, X. J.; Oberg, T.; Hormozdiari, F.; Dao, P. H.; Sahinalp, C.; Todeschini, R.; Polishchuk, P.; Artemenko, A.; Kuz’min, V.; Martin, T. M.; Young, D. M.; Fourches, D.; Muratov, E.; Tropsha, A.; Baskin, I.; Horvath, D.; Marcou, G.; Muller, C.; Varnek, A.; Prokopenko, V. V.; Tetko, I. V. Applicability Domains for Classification Problems: Benchmarking of Distance to Models for Ames Mutagenicity Set. J. Chem. Inf. Model. 2010, 50 (12), 2094−2111. (76) Zhu, H.; Martin, T. M.; Ye, L.; Sedykh, A.; Young, D. M.; Tropsha, A. QSAR Modeling of Rat Acute Toxicity by Oral Exposure. Chem. Res. Toxicol. 2009, 22 (12), 1913−1921. (77) Young, D.; Martin, T.; Venkatapathy, R.; Harten, P. Are the Chemical Structures in Your QSAR Correct? QSAR Comb. Sci. 2008, 27 (11−12), 1337−1345. (78) Kraus, G. A.; Pollock, G. R., III; Beck, C. L.; Palmer, K.; Winter, A. H. Aromatics from pyrones: esters of terephthalic acid and isophthalic acid from methyl coumalate. RSC Adv. 2013, 3 (31), 12721−12725. (79) Kraus, G. A.; Riley, S.; Cordes, T. Aromatics from pyrones: parasubstituted alkyl benzoates from alkenes, coumalic acid and methyl coumalate. Green Chem. 2011, 13 (10), 2734−2736. (80) Smith, L. K.; Baxendale, I. R. Flow synthesis of coumalic acid and its derivatization. React. Chem. Eng. 2018, 3 (5), 722−732. (81) Wiley, R. H.; Knabeschuh, L. H. 2-Pyrones. XIII.1 The Chemistry of Coumalic Acid and its Derivatives. J. Am. Chem. Soc. 1955, 77 (6), 1615−1617.

O

DOI: 10.1021/acssuschemeng.8b05275 ACS Sustainable Chem. Eng. XXXX, XXX, XXX−XXX