Anal. Chem. 1988, 60, 2249-2253
2249
Prediction of Gas Chromatographic Retention Indexes for Diverse Drug Compounds R. H. Rohrbaugh and P . C. Jurs*
Department of Chemistry, P e n n State University, University Park, Pennsylvania 16802
A quantltatlve structure-retention relatlonshlp (QSRR) study was performed on a set of 144 dlverse drugs. Two equatlons were generated and evaluated for predlctlve abllltles. A nine-term model was generated wlth a muklple R of 0.98 and a relative standard error (rse) of approxlmately 5 %. A second model was also developed, contalnlng only five terms ( R = 0.97; rse = 6%). Although the first model Is marginally better, the second model allows estimation of retention Indexes based solely on counts of atoms and bond types. Addltlonal tests show both models to have good predlctlve abllltles. The results of the study lndlcated that the methodology used can be applied successfully to sets of heterogeneous compounds.
Toxicological screening for drug compounds is an important task in many clinical laboratories. Typically, packed column or capillary column gas chromatography is used because of its speed, simplicity, and resolving power. Preliminary identification of compounds is frequently based on the observed retention of a compound. This value can be compared to the retention observed for known standards. However, standard compounds of many drugs are frequently unavailable or extremely expensive due to strict regulations governing the distribution of these compounds. In such cases, it would be useful to have the ability to estimate the chromatographic retention behavior based solely on structural information. The process of relating chemical structure to chromatographic retention comprises a field of research known as quantitative structure retention relationships (QSRR). This field has grown rapidly in the past few years, as evidenced by the number of papers and the recent publication of a book (1). The approach has been applied successfully to many compound classes and chromatographic systems (2-6). One of the current drawbacks of using QSRR is that they are typically developed only for a homogeneous set of compounds in a well-defined chromatographic system. Work in this research group has been aimed at exploring the limitations of this methodology with respect to data set homogeneity. QSRR was found to be quite successful for describing the gas chromatographic retention characteristics of a set of simple monoolefins (7)and polychlorinated biphenyl compounds (a), two very homogeneous sets of compounds. Further studies have shown the methods to be useful for modeling the gas chromatographic retention of polycyclic aromatic compounds (PACs) and their nitrated analogues (9,lO).These data sets were slightly more heterogeneous, yet the models developed were still very strong. The current study of 144 diverse drug compounds is intended to explore the quality of the models generated from a substantially more heterogeneous set of compounds. EXPERIMENTAL SECTION Data Set of Compounds. The data used in this study were reported by Anderson and Stafford (11). The Kovats retention indexes were determined for 175 drugs and metabolites by using capillary gas chromatography on a 15-m SE-30 column. Oven
temperature w&s programmed from 100 "C to 295 "C at 5 "C/min. Helium carrier gas was used at 45 cm/s (100 "C). The compounds range from small, simple drugs such as cadaverine, I, to complex macrocyclic molecules such as strychnine, 11. The variation in
H~N-NH~
I
I
0- 0 )
I1
the size of the compounds may be characterized by using the molecular weight range. The drugs studied ranged in molecular weight from 102 to 433, with the average molecular weight being 273. Retention indexes for the 175 compounds ranged from 974 to 3326 retention units, with a mean retention index of 2142. The experimental error in the retention indexes is reported to be C1.0 unit, with more variability for those compounds with larger retention indexes. It should be noted that although retention indexes are only weakly affected by temperature, the wide range of temperatures used in the program (100 "C to 295 "C) may introduce a biased variability. The QSRR study was performed in three stages: (1) entry and storage of the structures and the associated retention indexes, (2) generation of quantitative molecular structure descriptors, and (3) generation and testing of linear models. All work reported in this study was performed with the ADAPT software system (12)on the Penn State University Chemistry Department PRIME 750 minicomputer. Structure Entry. The compounds' structures and associated retention indexes were entered into computer files by using the structure entry capabilities of the ADAPT software system (13). This was achieved by sketching two-dimensional representations of the structures onto a graphics terminal. The program generates a compact connection table from this sketch and stores this information along with two-dimensional coordinates for each atom of the structure. Of the 175 compounds reported, only 144 were entered due to our inability to unambiguously determine the structures of some of the reported compounds. The 144 compounds used, along with their respective retention indexes, are listed in Table I. Descriptor Generation. The second step toward generating a model relating molecular structure to chromatographic retention indexes is the quantitative description of the structure. Molecular descriptors are measured or calculated values that attempt to quantitatively encode important features of a compound's chemical structure. This information may be topological, geometrical, physicochemical, or electronic. Because of the conformational flexibility of the majority of the compounds in this study, three-dimensional modeling of the chemical structures was not feasible. As a result, only topological, physicochemical, and simple electronic descriptors were utilized. The majority of descriptors generated for this study were topological. One reason for choosing these descriptors is their ease of calculation. Many can be determined simply by inspecting a drawing of the compound's structure. The others are easily generated from the stored connection tables. Fragment Descriptors. Fourteen descriptors were calculated from simple counts of atoms, bonds, or rings in the structure. These include number of atoms, number of carbons, number of oxygens, number of nitrogens, number of sulfurs, number of chlorines, number of bonds, number of single bonds, number of
0003-2700/S8/0380-2249$01.50/00 1988 American Chemical Society
2250
ANALYTICAL CHEMISTRY, VOL. 60, NO. 20, OCTOBER 15, 1988
Table I. Compounds Used in Drug Study
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
compound
RI
cadaverine cyclopentamine amphetamine phenethylamine phentermine methamphetamine methenamine amantadine mephentermine phenelzine phenylpropanolamine clortermine nicotine ephedrine phenmetrazine phendimetrazine ecgonine methyl ester diethyl propion methylprylon benzocaine mephenesin mescaline lindane methylphenidate methoxamine meperidine dimethyltryptamine caffeine cotarnine alphaprodine pheniramine ketamine prilocaine benzphetamine ethoheptazine diphenhydramine lidocaine phencyclidine diethyltryptamine doxylamine orphenadrine phenyltoloxamine phenyramidol tripelenamine methapyrilene naphazoline thenyldiamine chlorpheniramine procaine mepivacaine carbinoxamine bromopheniramine dicyclomine methaqualone pipradrol phenindamine propranolol methadone oxymetazoline bromodiphenhydramine hyoscyamine atropine cocaine amitriptyline propoxyphene levorphanol thonzylamine nortriptyline procainamide chloroprocaine imipramine zolamine
974 1065 1111 1133 1138 1161 1191 1211 1236 1265 1287 1305 1315 1336 1409 1431 1439 1470 1489 1513 1518 1657 1682 1695 1700 1720 1729 1606 1758 1777 1779 1798 1800 1806 1823 1842 1842 1860 1875 1888 1915 1915 1932 1949 1951 1958 1963 1972 1978 2025 2047 2067 2080 2096 2105 2109 2111 2121 2123 2125 2146 2147 2161 2162 2165 2169 2172 2174 2175 2177 2190 2193
double bonds, number of aromatic bonds, molecular weight, number of basis rings, number of ring atoms, and number of electron lone pairs.
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144
compound
RI
doxepin tetracaine pyrilamine desipramine cyclobenzaprine phenoxybenzamine protriptyline trihexylphenidyl phenazopyridine ethylbenzoylecgonine triprolidine carbetapentane promethazine benactyzine pentazocine pramoxine dimethindene bupivacaine hoyscine promazine oxazepam benztropine maprotiline levallorphan propoxycaine piperidolate codeine dihydrocodeine lorazepam ethylmorphine morphine hydrocodone hydromorphone diazepam methdilazine butacaine chloropromazine oxycodone chlorprothixene oxymorphone disopyramide cinnamoylcocaine methotrimeprazine mepazine nalorphine trimethoprim loxapine amoxapine diacetylmorphine naloxone cinchocaine fentanyl flurazepam quinine chlordiazepoxide quinidine clonazepam hydroxyzine anileridine cyclomethycaine haloperidol 7-hydroxyamoxapine 8-hydroxyamoxapine prochlorperazine meclizine cholesterol strychnine thioridazine thiethylperazine buclizine butaperazine mesoridazine
2194 2197 2200 2200 2204 2205 2207 2211 2217 2219 2224 2230 2234 2235 2246 2248 2250 2251 2261 2266 2271 2287 2296 2306 2307 2316 2323 2323 2353 2364 2367 2375 2381 2383 2421 2422 2425 2453 2460 2462 2470 2480 2490 2500 2510 2514 2542 2575 2581 2608 2675 2681 2741 2471 2742 2745 2759 2832 2839 2841 2887 2900 2907 2921 3000 3041 3058 3080 3210 3267 3308 3326
Molecular Connectivity. Molecular connectivity indexes were originally developed by RandiE (14) and later modified by Kier and Hall (15). These indexes are based on a graph-theoretical
ANALYTICAL CHEMISTRY, VOL. 60, NO. 20, OCTOBER 15, 1988
treatment of the molecular topology of the compounds and encode information about the branching and size of the molecules. The general equation for calculating molecular connectivities of the nth order is "Xt
c
= ", (6,h -112 s=1
,
,
I
where "xt is the nth order term of type t (t = path, cluster, path-cluster, or chain), n, is the number of connected subgraphs of type t, m is the number of edges, and 6 is the vertex valence. Nineteen molecular connectivity indexes were calculated for each compound. The indexes calculated include Ox , 'x ,'xP,3xp,4xp, 'Xp, 'X > 3Xc, 4Xc, 'Xc, 'Xe, 'Xpc, 'Xw9 'Xpc, 'Xch, 'Xchr 'Xch, %ch, and 'xeh. A!l indexes were calculated with adjustments for the presence of non-sp3 carbon centers (15). Kappa Indexes. Six kappa indexes were also calculated. These indexes were developed by Kier (16) and encode topological shape by using a graph-theoretical approach. Each structure is depicted as a graph consisting of nodes (atoms) and edges (bonds). For the first three kappa indexes, no distinction is made between different atom and bond types. The 1~ index is based on the number of paths of length one (or the number of one bond fragments) found in the molecule. Similarly, the 2~ and 3K indices are based on the number of paths of length two and three. For each index, the number of paths of the appropriate length is taken in relation to the minimum and maximum number of paths of the same length possible for a graph containing the same number of nodes. In general, the nth order Kappa index, "K, for a molecule with N atoms is given by the following equation:
where C is a constant, "P,, is the maximum number of paths of length n for N nodes, " P ~is, the minimum number of paths of length n for N nodes, and "Piis the actual number of paths of length n for the structure. Three modified indices (1~,,2~,,3~,) were also used, which wefe designed by Kier to take atom and bond types into consideration (16). Path Counts. In terms of graph theory, a molecule can be represented as a series of nodes and edges. A path is defined as an alternating sequence of nodes and edges that begins and ends with a node. Several path-related descriptors were generated. These include number of total paths, number of total paths/ number of atoms, total weighted paths (17), total weighted paths/number of atoms, Weiner number ( l a ) ,molecular identification number (MID) (19),MID/number of atoms, x(atomic IDS for heteroatoms), x(atomic IDS for oxygen atoms), and x(atomic IDS for N atoms). Physicochemical. Physicochemical descriptors are used to indirectly encode structural information about a set of compounds. Two such descriptors were generated: molecular polarizability (20) and molar refraction (21). These descriptors were calculated by using atomic and fragment additivity rules, respectively. Simple Electronic. Two simple Del Re u-charge-based descriptors were calculated (22). Atomic charges were calculated for each atom by summing the contributions from each bond on the atom. From these atomic charges were calculated the total u charge (Elpartial u charges on each atoml) and the electron density on the atom with the most negative partial u charge. Model Generation. The next step in model generation is to narrow down the pool of descriptors via objective feature selection. First, those descriptors with fewer than 10% nonzero values are disregarded. These descriptors typically do not have enough information content to be useful. Next, those descriptors with zero or very small relative standard deviations are eliminated. These descriptors usually have very little discriminating power among the observations in the data set. Finally, it is often desirable to eliminate those descriptors containing redundant information. This is accomplished by evaluating the descriptors for simple and multiple collinearities. When a high collinearity is detected (R> 0.90),one or more of the offending descriptors is removed from consideration. Twenty-eight of the original 53 descriptors were eliminated by using the above criteria. It should be pointed out that the descriptor pruning procedures described do not utilize any knowledge of the dependent variable and are therefore unbiased.
3400 1
2251
-
' *..
Estimated Retention Index
91.
-
2200
.. .e . ..r:..
'
.*
.
300
..:.
,%