The Signature Molecular Descriptor. 5. The Design of

Oct 15, 2005 - Department of Chemical Engineering, Tennessee Technological University, Box 5013, Cookeville, Tennessee. 38505, Department of ...
4 downloads 0 Views 154KB Size
Ind. Eng. Chem. Res. 2005, 44, 8883-8891

8883

The Signature Molecular Descriptor. 5. The Design of Hydrofluoroether Foam Blowing Agents Using Inverse-QSAR Derick C. Weis,† Jean-Loup Faulon,‡ Richard C. LeBorne,§ and Donald P. Visco, Jr.*,† Department of Chemical Engineering, Tennessee Technological University, Box 5013, Cookeville, Tennessee 38505, Department of Computational Biology, Sandia National Laboratories, P.O. Box 969, MS 9951, Livermore, California 94551-9951, and Department of Mathematics, Tennessee Technological University, Box 5054, Cookesville, Tennessee 38505

In this work, a novel technique for molecular design is explored by generating compounds to replace R-141b in polyurethane foam blowing applications. This technique, which is known as the inverse quantitative structure-activity relationship (I-QSAR) method, is based on solving the inverse problem of molecular design, using a newly developed descriptor called Signature. In this work, we optimize the properties of the candidate solutions based on the normal boiling point and the vapor-phase thermal conductivity. After generating more than 3 million solutions with this technique, we have identified seven compounds for further study. Unlike other inverse design techniques, I-QSAR with Signature does not use a template compound and, thus, nonintuitive candidates with optimal predicted properties can result. The seven best candidates that form the focused database include straight chains and rings of a variety of sizes with one or two O atoms in the ring. Introduction The forward quantitative structure-activity relationship (QSAR) procedure simply defines an equation that relates a variable of interest (a dependent variable) to a selection of independent variables. (Note that, in keeping with implied convention, we use the term “activity” in QSAR to represent any property and not just a biological activity.) The dependent variable can be any property of interest (binding affinity, normal boiling point, IC50, etc.), whereas the independent variables are (normally) related to the structure of the compound. The arrival at a QSAR for a particular property involves training the parameters of the model against a set of data (training set), with normally a small portion of the data held back for validation of the model (test set), although other means also are used.1 After the QSAR is effectively trained and validated, one can use this model to predict the property value of a compound by determining the values for the independent variables in a straightforward manner. Such forward QSARs form the basis for much property estimation that occurs throughout the chemical process industry. In fact, the work The Properties of Gases and Liquids, which was compiled by three chemical engineering authors, is basically a large reference for property estimation techniques.2 However, what if one was not interested in estimating a property of a particular compound, but instead identifying a compound (or compounds) that had a particular property. This is, in essence, the inverse design problem that has been * To whom correspondence should be addressed. Tel.: (931) 372-3606. Fax: (931) 372-6352. E-mail: [email protected]. † Department of Chemical Engineering, Tennessee Technological University. ‡ Sandia National Laboratories. § Department of Mathematics, Tennessee Technological University.

called the inverse quantitative structure-activity relationship (I-QSAR).3 Solutions to such problems have the potential to impact a wide variety of areas in science and engineering, because they allow for the design of compounds.4 Inverse design problems are ubiquitous and the reader has probably solved one this week. For example, if you have never tried ice cream and were offered it at a friend’s house, your choices might be vanilla or chocolate. You can brute force your way to decide the best candidate by trying both. However, if your first encounter with ice cream is at an establishment that boasts 31 flavors, the brute force technique becomes (for most) intractable. A strategy that, for example, allows you to decide between all vanilla-based or chocolatebased ice creams and, accordingly, eliminate the set based on a poor performer would be more efficient. Although the aforementioned example is naive, it illustrates a key point, namely that the solution strategy used is a function of the size of the solution space. In the field of molecular design, the solution space comes from all compounds that can be reasonably made from the various atoms in the periodic table. Hence, one needs a way to limit this solution space to efficiently arrive at candidate solutions. We describe a few of these techniques below. As previously mentioned, the step to reduce the solution space from the entire chemical universe to that of a very small subset is a design decision that artificially (although practically) limits the number of solutions available. However, even very restrictive criteria can still lead to a very large solution space. For example, there are over 38 million total isomers for all alkanes up to C-22.5 Yet, there are over 16 million acyclic isomers alone for C4H10N2O3S2 with the restriction that the molecule contains three double bonds and multiple valencies on the S atom.6 Clearly, when one adds more

10.1021/ie050330y CCC: $30.25 © 2005 American Chemical Society Published on Web 10/15/2005

8884

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005

elements to the search space, combinatorial explosion dominates. What if, instead of using a predefined search space, one uses a smaller (i.e., refined) database of compounds to increase the size of the overall database? Such techniques are the backbone of computer-aided molecular design (CAMD). Here, a problem is asked with the goal of a compound as a solution. Certain fragments are pooled in a database and connectivity rules are established. At this point, the techniques bifurcate. In one approach, a recursive scheme that builds molecules from fragments is performed and tested for fitness.7 Note that the generation of these compounds can occur in different ways such as the use of genetic algorithms.8,9 In the other approach, structure feasibility relationships are developed and solved in terms of a mixed-interger nonlinear programming problem.10,11 Newer techniques incorporate the fitness function directly into the optimization problem, using topological indices that can be written in terms of vectors that describe connectivity.12 Depending on the expressions involved, an optimal solution may not be guaranteed.13 In addition, multiple solutions may correspond to the same compound. Although I-QSAR and CAMD are, in essence, attempting to achieve the same goal, the literature has not overlapped as one would expect. In the former camp, there have been several methods over the past decade or so that have been proposed to solve the I-QSAR problem, with or without constraints. Kier and Hall published a series of papers in the early 1990s that described the I-QSAR methodology applied to chi indices.14-17 The QSARs that they developed had a maximum of four descriptors and example applications included the molar volume of alkanes and isonarcotic agents. Zefirov and co-workers during this same time period used a similar technique but from the Kappashape index based on a count of paths.3 The QSARs they used were given in terms of three Kappa-shape descriptors and they considered three distinct systems, namely alkanes, alcohols, and small oxygen-containing compounds. In 2001, Bruggemann et al. demonstrated the use of Hasse diagrams combined with a similarity measure in the generation of solutions to the inverse problem involving toxicity of algae.18 Their method is based in partial ordered sets and does not assume a particular model for the QSAR. Garg and Achenie recently demonstrated a reasonable approach to the solution of the I-QSAR problem in 2001.19 Taking a target scaffold of an antifolate molecule for dihydrofolate reductase inhibition, these authors generated a QSAR for both activity and selectivity. They solved the I-QSAR problem to maximize selectivity through changing substituents on the scaffold, subject to a constraint of a threshold activity. This technique is not unlike the work of Camarda and Maranas, who used chi indices and small fragments to determine optimal polymer repeat units for a variety of properties, given a template.20 Finally, a work by Skvortsova et al. from 2003 demonstrated the use of the Hosoya index plus constraints on the number of C atoms for a system of 78 hydrocarbons in the solution of the I-QSAR problem.21 All of the methods described above for the solution to the I-QSAR problem (or for the CAMD problem) are limited in some way. First, the inverse process is limited to the descriptors for which inverse solutions can be generated, namely, the chi index, the Kappa index, Hasse diagrams, and the Hosoya index. Thus, the

aforementioned solutions are constrained to the few properties and activities that can be mapped with these descriptors. The second limitation of the aforementioned solution technique concerns the degeneracy of the solutions. It is not uncommon for a particular value of a topological index to correspond to a large number of compounds;22 thus, the number of solutions to a given inverse problem may become too large to be manageable. The number of solutions can be reduced if a scaffold or template is used; however, in this case, nonintuitive candidates are not generated.20 Other techniques do exist to limit degeneracy that apply, for example, a filtering using higher-order fragments to distinguish isomers, but that assumes the existence of the desired physical properties for that particular group of interest. An inverse-QSAR methodology based on the novel molecular descriptor, Signature, has been recently developed that addresses these issues and will be described next.23 This paper is arranged in the following manner. First, an introduction to Signature is provided, as well as a discussion on how Signature is an ideal descriptor for use in the solution to the I-QSAR problem. Second, a motivation for the problem at hand is given and a solution strategy outlined. Third, a QSAR for both the boiling point and the vapor-phase thermal conductivity are developed for hydrofluoroethers (HFEs) and presented. The inverse design problem then is formulated and solved on the height-1 atomic Signatures from the boiling point data set with the distribution of solutions provided. The subset of solutions satisfying the design criterion on boiling point is next filtered through the QSAR on vapor-phase conductivity. This smaller subset is then filtered a third time using simple energetic constraints. The focused database of solutions that are determined are then inverted and, finally, two-dimensional (2D) molecular graphs of the best replacements are presented. Note that, in all of the calculations (except where noted), we have used a Dual-CPU Pentium 4 Xeon 2800 MHz computer processor with 1024 MB RAM. We report computer processing unit (CPU) times for certain steps in the algorithm, where relevant. Review of Signature The Signature molecular descriptor has been discussed in detail in recent works.22,24-26 However, to the benefit of the reader, a brief introduction to Signature is provided in this section. At its heart, Signature is an efficient method to encode local topologies within a molecule, where the term “local” is expressed in Signature using the term “height”. The desired Signature height of a molecule simply becomes the sum of the individual atomic Signatures of that height, weighted by their occurrence. More formally, we define an atomic Signature, hσG(x), as the canonical subgraph of G (the 2D graph of the molecule) which consists of all atoms a distance h from the root x. Here, h is the height of interest, whereas the roots are the nodes (i.e., atoms) of G. Similarly, the molecular Signature, hΣG, is then the set of all unique atomic Signatures and the occurrence with which they appear in the molecular graph. The distinction between atomic and molecular Signature, as well as the meaning of height, is best shown in Figure 1. Ideally, the perfect descriptor (or descriptor set) to use in I-QSAR techniques should possess the following

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005 8885

Figure 1. Methanol and its corresponding atomic and molecular Signatures at heights 0, 1, and 2. The molecular Signature is, ultimately, the sum of the atomic Signatures of all roots weighted by the occurrence number. The carbon root and its atomic Signatures are shown in the figure.

qualities: (i) create a useful QSAR, (ii) have a low degeneracy, and (iii) be invertible.23,27 The first quality, which involves the creation of a useful QSAR, is selfevident. If the descriptor does not describe the dataset well (i.e., does not create a useful QSAR), the inverse solutions will have “desired” values with little utility. The second quality, which involves low degeneracy, is important because a descriptor with a high degeneracy is not discriminating between optimal compounds and uninteresting compounds. Finally, and most importantly, after a set of solutions has been obtained for a target value, there must be a way to generate an actual compound from a solution efficiently and effectively. Signature has already been demonstrated to possess the three qualities previously described. First, Signature compared favorably with commercially available descriptors (via the Molconn-Z program)28 on small studies (the activity of 102 compounds against HIV-1 protease) and large studies (the octanol/water partitioning coefficient of 104 compounds).26 Signature has also been shown to predict protein-protein interactions with accuracies similar to other bioinformatics techniques.29 In fact, Signature encapsulates information from which other molecular descriptors can be computed.26 Second, Signature was shown to be the least degenerate of many (several dozen) other popular descriptors evaluated for a variety of molecular series (alkanes, alcohols, fullerenes, and peptides), and, in fact, the user can control the degeneracy by the choice of height selected.22 Third, and foremost, Signature provides a way to go from numerical solutions of the I-QSAR problem to actual structures that correspond to solutions.22 Indeed, the main advantage of Signature versus other molecular descriptors is its readiness for inverse problems. An algorithm to both enumerate and sample chemical structures corresponding to solution vectors (i.e., molecular Signatures) has already been developed. In a recent work, we have used the I-QSAR procedure with Signature on a very small set of LFA-1/ICAM-1 peptide inhibitors and have developed a small, focused database of compounds, several of which are predicted to be more potent than the strongest inhibitor in the training set. Two of the more potent inhibitors were synthesized and tested in vivo, confirming them to be the strongest inhibiting peptides for ICAM-1 to date.23

Figure 2. Flowchart showing the inverse quantitative structureactivity relationship (I-QSAR) method using Signature for this problem.

Problem Motivation Sekiya and Misaki recently published a work opining on the potential of hydrofluoroethers to replace chlorofluorocarbons (CFCs), hydrochlorofluorocarbons (HCFCs), and perfluorocarbons (PFCs).30 They evaluated dozens of hydrofluoroethers (HFEs) through examination of properties such as density, surface tension, specific heat, thermal stability, boiling point, flammability, thermal conductivity, etc. for use in various processes. Their approach is typical of this industry, in that property information is gathered, a “best” selection is made based on the available data, and in-application performance (as a solvent, in a foam, or as a refrigerant) is evaluated. As a first foray using Signature for inverse compound design in a nonbiological application, we have chosen to develop a focused database of HFEs to replace the HCFC R-141b in polyurethane insulating foam applications. R-141b has a nonzero ozone depletion potential (because it contains chlorine), and, thus, long-term replacements that do not contain chlorine are being evaluated. To this end, we have chosen two properties of R-141b to use as targets, namely its normal boiling point (Tb ) 305 K) and vapor-phase thermal conductivity (λv ) 11.82 mW m-1 K-1 at 50 °C). We have chosen these two properties because they are very important properties in evaluating a compound’s viability as a polyurethane foam blowing agent, especially in insulation applications.31 After a literature search, we were able to find the boiling point of 76 HFEs and the vapor-phase thermal conductivity of 15 HFEs. We chose the larger set to use for the inverse problem and subsequently used the vapor-phase thermal conductivity as a screen from the initial set. Such a procedure is similar to that where one system constraint is made the objective function and the remainder provides solution constraints on the objective function. A flowchart denoting the overall I-QSAR procedure for this problem is provided in Figure 2. A second design choice involved the appropriate atomic Signature height to use. As the height of the atomic Signature increases, fewer compounds will be generated that correspond to the inverse solutions that are generated. Conversely, a height-0 atomic Signature height (just the molecular formula) will map onto many

8886

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005

Table 1. Listing of the 76 Hydrofluoroethers (HFEs) Used in the Normal Boiling Point Study, along with Their Experimental Normal Boiling Point (Tb) Values structure

Tb (K)

reference

structure

Tb (K)

reference

CF3-CF2-CF2-CF2-O-CH2-CH2-CH3 CF3-CF2-CF2-CF2-CF2-O-CH3 CF3-CF2-CF2-CF2-CF2-O-CH2-CH3 CF3-CF(OCF3)-CH2-CH3 (CF3)3C-O-CH3 (CF3)2CF-CF2-CF2-O-CH3 (CF3)2CF-CF2-CF2-O-CH2-CH3 CF3-CF(OCF3)-CH2-CHF2 (CF3)3C-O-CH2-CH3 CF3-CHF-CF2-O-CH2-CF3 CF3-CHF-CF2-O-CH2-CF2-CHF2 CF3-CHF-CF2-O-CH2-CF2-CF3 CHF2-CF2-CH2-O-CF2-CHF2 CHF2-CF2-CF2-CF2-CH2-O-CH3 CH2F-CF2-O-CH3 CF3-CF2-CH2-O-CH3 CHF2-CF2-CH2-O-CHF2 CF3-CF2-CH2-O-CHF2 (CF3)2-CH-O-CH3 CHF2-CF2-O-CH2-CH3 CF3-CH2-O-CF2-CH2F CHF2-CF2-O-CH2-CF3 CF3-CH2-O-CH2-CF3 CHF2-CF2-O-CH2-CHF2 CH2F-CF2-O-CHF2 CF3-CHF-CF2-O-CH3 CHF2-CH2-O-CHF2 CHF2-CF2-CF2-O-CH3 CF3-CF2-CH2-O-CF2-CHF2 CHF2-CF2-CH2-O-CF3 (CF3)2-CH-O-CH2F CF3-CF2-O-CH2-CHF2 (CF3)2-CH-CF2-O-CH3 CF3-CHF-CF2-CH2-O-CF3 CF3-CF2-CF2-O-CH2-CF3 CF3-CF2-CF2-O-CH2-CF2-CHF2 CF3-CF2-CH2-O-CF2-CF3 CF3-CF2-CF2-O-CH2-CHF2

369.30 358.08 373.00 319.09 326.79 357.90 373.48 331.78 340.27 345.87 379.07 360.64 366.32 395.83 316.93 321.55 348.60 319.09 324.10 329.80 338.18 329.37 336.91 352.13 316.20 327.47 328.49 341.02 343.40 319.21 331.73 318.53 343.07 337.62 325.47 356.97 319.99 340.38

41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41 41

CHF2-CH2-O-CH3 CHF2-CF2-CH2-O-CH3 CF3-CF2-CF2-CH2-O-CH3 CF3-CHF-CF2-CH2-O-CH3 CHF2-CF2-O-CH2F CF3-CF2-CF2-O-CH2F CF3-CF2-CF2-CF2-O-CH2-CH3 CF3-CH2-O-CH3 CF3-CHF-O-CHF2 (CF3)2-CF-O-CH3 CF3-CF2-O-CH3 CF3-CF2-CF2-O-CH3 CH3-CH2-CH2-O-CHF2 CF3-CH2-O-CH2-CH3 CHF2-CH2-O-CH2-CH3 CHF2-O-CF2-CHF2 CHF2-O-CHF-CHF2 CF3-CF2-CHF-O-CF3 CF3-CF2-CHF-O-CHF2 CHF2-CF2-O-CF2-CHF2 CF3-CHF-O-CF2-CHF2 CHF2-CF2-O-CHF-CHF2 CHF2-CF2-CHF-O-CHF2 CF3-CF2-CH2-O-CH2F CHF2-CH2-O-CH2-CHF2 CF3-CHF-O-CF3 CF3-CF2-O-CF2-CHF2 CHF2-CHF-O-CH3 CF3-O-CH2-CF2-CF3 CF3-CH2-O-CF2-CF3 CF3-CF2-O-CF2-CH2F CF3-CH2-O-CHF2 CHF2-CF2-O-CH3 CF3-CF(OCF3)-CH2-CF3 (CF3)2-CH-O-CHF2 CHF2-CF2-CH2-O-CF2-CF3 CF3-CF2-CF2-O-CH2-CF2-CF3 CF3-CF2-O-CH2-CH3

321.65 347.50 344.13 360.65 326.74 316.42 350.04 304.77 296.50 302.56 278.74 307.40 324.15 323.45 338.15 298.15 325.65 293.15 309.15 327.15 315.15 341.15 341.15 333.15 376.15 263.54 295.15 318.4 299.36 300.91 308.63 299.25 310.34 323.64 315.27 336.14 343.95 301.26

41 41 41 41 41 41 41 42 42 42 42 42 43 43 43 43 43 43 43 43 43 43 43 43 43 44 44 45 30 30 30 30 46 41 41 41 41 42

compounds. In addition, the information content contained in the QSARs developed are also a function of the atomic Signature height. For example, in preliminary work, we have determined that a data set of just two atoms (C and H) with only a height-0 Signature resulted in the maximum information content, whereas a data set with many atoms shifted the optimal atomic Signature height out to 2 or 3. Taking into account both of the previously discussed features, we have chosen to use atomic Signature height 1 during this study. The first step in the I-QSAR procedure using Signature is to determine the database of unique height-1 atomic Signatures that exist among the 76 compounds (see Table 1). This is performed using an in-house translator that converts different file formats (SMILES, polygraf, etc.) into atomic Signatures with a userspecified height.32 From the 76 compounds, we found 22 unique height-1 atomic Signatures (see Table 2). After the database of height-1 atomic Signatures exists, constraint equations are needed so we can reconstruct compounds from solutions generated. There are two types of constraint equations, namely the graphicality equation and the consistency equations. These are briefly reviewed here; for a more complete discussion, the reader is referred to our previous work.23 The graphicality equation is a necessary condition on whether at least one connected graph can be constructed from the atomic Signatures. This equation, taken directly from graph theory, uses only the degree of the vertexes in the graph. The graphicality equation can be computed directly from the height-0 molecular Signature. Using the 22 height-1 atomic Signatures from

Table 2. Listing of the 22 Height-1 Atomic Signatures Obtained from the Normal Boiling Point Training Set symbol x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11

Height-1 signature [C]([C][C][C][F]) [C]([C][C][C][H]) [C]([C][C][C][O]) [C]([C][C][F][F]) [C]([C][C][F][H]) [C]([C][C][F][O]) [C]([C][C][H][H]) [C]([C][C][H][O]) [C]([C][F][F][F]) [C]([C][F][F][H]) [C]([C][F][F][O])

symbol x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22

Height-1 signature [C]([C][F][H][H]) [C]([C][F][H][O]) [C]([C][H][H][H]) [C]([C][H][H][O]) [C]([F][F][F][O]) [C]([F][F][H][O]) [C]([F][H][H][O]) [C]([H][H][H][O]) [F]([C]) [H]([C]) [O]([C][C])

Table 2, the graphicality equation in this problem becomes

(2x1 + 2x2 + 2x3 + 2x4 + 2x5 + 2x6 + 2x7 + 2x8 + 2x9 + 2x10 + 2x11 + 2x12 + 2x13 + 2x14 + 2x15 + 2x16 + 2x17 + 2x18 + 2x19 - x20 - x21 + 2) % 2 ) 0 (1) where the “%” indicates modulus. The consistency equations are a collection of constraints that guarantee that a bond in one atomic Signature will match up with a bond in another atomic Signature, albeit in reverse order. Blind reconstruction of the molecule requires equations to enforce these conditions and this is done by matching bonds between two atoms of one Signature to the bonds involving the same atoms in all other Signatures. Much more detail and a simple example illuminating this point is provided

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005 8887

elsewhere.23 For our problem, the consistency equations are provided below.

-x1 - 2x4 - x5 - x6 - 3x9 - 2x10 - 2x11 - x12 x13 - 3x16 - 2x17 - x18 + x20 ) 0 (2) -x2 - x5 - 2x7 - x8 - x10 - 2x12 - x13 - 3x14 2x15 - x17 - 2x18 - 3x19 + x21 ) 0 (3) -x3 - x6 - x8 - x11 - x13 - x15 - x16 - x17 - x18 x19 + x22 ) 0 (4) (3x1 + 3x2 + 3x3 + 2x4 + 2x5 + 2x6 + 2x7 + 2x8 + x9 + x10 + x11 + x12 + x13 + x14 + x15) % 2 ) 0 (5) All five of the constraint equations are Diophantine in nature, in that they contain both integer solutions and integer coefficients. Implicit in the aforementioned constraint equations is the constraint that the variables are non-negative. An algorithm adapted from Contejean and Devie,33 which uses a geometric interpretation of Fortenbacher’s algorithm,34 was implemented to solve this system. Note that we only solve eqs 2-4 initially and use the modulus equations (eqs 1 and 5) later as a filter on the solutions. This technique has proven more efficient in other studies within our group. The output of the Diophantine solver was a set of 64 basis vectors and required only 0.01 s of computer processing unit (CPU) time to complete. Using a reasonable range on the number of atoms in the desired focused database (here, 4-31 atoms), we have generated 6 610 909 potential solutions as linear combinations of the basis vectors in 48 min of CPU time. These potential solutions were filtered against the modulus constraints and resulted in 3 305 767 solutions to the constraint equations previously mentioned (eqs 1-5) in 140 s (2 min, 20 s) of CPU time. Note that, as required, the 76 compounds from the training set are among the solutions generated. Although we have generated over 3 million solutions from this procedure, at this point we cannot evaluate their “goodness” because we need a way to determine the properties of the compounds. Accordingly, the next steps are to make QSARs for both the boiling point and the vapor-phase thermal conductivity. Boiling-Point QSAR. From the experimental information (provided in Table 1), using the 22 height-1 atomic Signatures as independent variables (provided in Table 2), a QSAR was generated using a forwardstepping, multiple linear regression technique, as done in previous work with Signature.35 This technique is computationally efficient and can control the number of independent variables of the QSAR equation. The resulting QSAR had 10 descriptors (i.e., height-1 atomic Signatures), as is given below:

Tb (K) ) 177.87 + 10.423x1 + 0.8514x4 21.462x7 + 5.5259x10 - 52.053x14 - 13.881x15 2.6677x16 - 27.655x19 + 10.244x20 + 27.662x21 (6) Statistics on the correlative ability of this QSAR are provided in Table 3, as well as the relationship being presented graphically in Figure 3. For the purposes of this preliminary study, a simple way to provide some assessment on the predictive nature of this QSAR is to perform a cross-validation study. To this end, we report leave-one-out q2 statistics for this QSAR and provide

Figure 3. Normal boiling-point quantitative structure-activity relationship (QSAR) plotted with a 45° line; the predicted boiling point (Tb) from eq 6 is plotted against the experimental values. Table 3. Overall Statistics for the Two Quantitative Structure-Activity Relationships (QSARs) Developed QSAR

F

R2

s2

q2

boiling point thermal conductivity

0.813 0.128

0.947 0.864

5.58 0.23

0.924 0.589

Table 4. Listing of the 15 HFEs Used in the Vapor-Phase Thermal Conductivity Study, along with Their Experimental Vapor-Phase Thermal Conductivity (λv) Values structure

λv (mW m-1 K-1)

reference

CF3-CF2-CH2-O-CHF2 (CF3)2-CH-O-CH3 CHF2-CF2-O-CH2-CF3 CH2F-CF2-O-CHF2 (CF3)2-CH-O-CHF2 (CF3)3C-O-CH3 CF3-CF2-O-CH3 CF3-CF2-CF2-O-CH3 CF3-CHF-O-CF3 CHF2-O-CHF2 CF3-CF2-O-CF2-CHF2 CHF2-CF2-O-CH3 (CF3)3C-O-CH2-CH3 (CF3)2-CF-O-CH3 CF3-CH2-O-CHF2

12.93 12.67 12.37 13.08 13.46 12.00 13.81 12.79 13.44 13.66 12.68 13.34 11.85 13.01 13.75

46 46 46 46 46 46 46 46 46 46 46 46 46 46 46

this information in Table 3. These results indicate that over-fitting has been mitigated and that boiling-point prediction errors are reasonable, compared to the correlation results. Vapor-Phase Thermal Conductivity QSAR. Experimental vapor-phase thermal conductivity (λv) data for 15 HFEs were obtained (see Table 4), and another QSAR was created in a manner similar to the boiling point QSAR. Note that this QSAR was smaller due to the limited experimental data available. Because of this, there were only 17 unique atomic Signatures instead of 22 with the six absent being as follows: x1, x2, x5, x7, and x18. Since all of the compounds in the vapor-phase thermal conductivity training set were also in the normal boiling-point training set, the notation for the atomic Signatures remained the same.

λv (mW m-1 K-1) ) 16.67 - 0.6205x3 + 0.2213x4 + 0.4336x9 + 0.7586x14 - 0.4427x20 - 0.4543x21 (7) The predicted versus experimental values for the vapor-phase thermal conductivity QSAR are given in Figure 4 with the statistic provided in Table 3. Once again, we have performed cross-validation studies using

8888

Ind. Eng. Chem. Res., Vol. 44, No. 23, 2005

Figure 4. Vapor-phase thermal conductivity QSAR plotted with a 45° line; the predicted vapor-phase thermal conductivity (λv) from eq 7 is plotted against the experimental values. Figure 6. Distribution of inverse solutions with predicted normal boiling points between 300 K and 310 K, displayed as a function of the predicted vapor-phase thermal conductivity (λv). Those compounds with predicted vapor-phase thermal conductivities of 99.7%. This filtering step required 17.2 s of CPU time to complete. Screen for Vapor-Phase Thermal Conductivity. The vapor-phase thermal conductivity (λ) of the blowing agent (which is present in the cells of the foam matrix) accounts for more than half of the overall-K factor of the foam. Accordingly, small changes in the λv value are important (the lower the λ value, the better). Because the vapor-phase thermal conductivity of R-141b is 11.82 mW m-1 K-1 at 50 °C, we used a slightly higher value (λv ) 12.00 mW m-1 K-1 at 50 °C) so that we did not arbitrarily remove potentially useful candidates from the focused library. Accordingly, the 7134 solutions that passed through the boiling-point QSAR screen were evaluated using the vapor-phase thermal conductivity QSAR that we created. This distribution is provided in Figure 6, with 5705 solutions being identified as having a λv value of