Prediction of Chiral Chromatographic Separations Using Combined

Carboxylic acids: prediction of retention data from chromatographic and electrophoretic behaviours. Maria Concetta Bruzzoniti , Edoardo Mentasti , Cor...
0 downloads 4 Views 93KB Size
Anal. Chem. 1997, 69, 3879-3883

Prediction of Chiral Chromatographic Separations Using Combined Multivariate Regression and Neural Networks Tristan D. Booth,† Kamal Azzaoui,‡ and Irving W. Wainer*,‡

Departments of Chemistry and Oncology, McGill University, Montreal, Quebec, H3G 1A4 Canada

A new method for the prediction and description of enantioselective separations on HPLC chiral stationary phases (CSPs) is described. Based on the combination of multivariate regression and neural networks, the method was successfully applied to the separation of a series of 29 aromatic acids and amides, chromatographed on three amylosic CSPs. Combinations of charge transfer, electrostatic, lipophilic, and dipole interactions, identified by multivariate regression, were found to describe retention and enantioselectivity, with highly predictive models being generated by the training of back-propagation neural networks. HPLC chiral stationary phases (CSPs) are now routinely used in enantiomeric separations. With over 100 commercially available CSPs, most racemates can be separated. However, at the present time, there is no reliable way to determine which CSP should be applied to a particular separation. Therefore, development of a system that is able to determine whether or not an enantioselective separation will be achieved on a particular CSP would be of immense benefit. Predictive models for empirical systems are often attempted but are rarely successful outside the realm of the initial data set. The field of chromatography is one area where much interest has been focused, as the data obtained benefits from the high reproducibility associated with the robust nature of the technique. One strategy for the development of chemometrically driven predictions of retention and enantioselectivity is the construction of quantitative structure-enantioselective retention relationships (QSERRs). These relationships take an extrathermodynamic approach to identification and isolation of the most important structural characteristics for a series of racemic solutes deemed responsible for the observed chromatographic retention. These descriptors are usually used as the independent variables in multivariate regression analysis and correlated against the experimental ln k′ data. Development of statistically significant equations allows for the possibility of extracting physically meaningful information relating to the fundamental solutestationary phase interactions. Kaliszan et al.1,2 successfully used * Address correspondence to this author at Montreal General Hospital, Room B7103, 1650 Cedar Ave., Montreal, Quebec, H3G 1A4 Canada. Fax: 514-9348214. E-mail: [email protected]. † Department of Chemistry. ‡ Department of Oncology. (1) Kaliszan, R.; Noctor, T. A. G.; Wainer, I. W. Chromatographia 1992, 33, 546. (2) Kaliszan, R.; Kaliszan, A.; Noctor, T. A. G.; Purcell, W. P.; Wainer, I. W. J. Chromatogr. 1992, 609, 69. S0003-2700(97)00215-1 CCC: $14.00

© 1997 American Chemical Society

this approach to investigate chiral recognition on a human serum albumin CSP and, consequently, provide information on the HSAbenzodiazepine binding site.3 Similar studies have also been performed on CSPs such as (S,S)-N,N′-(3,5-dinitrobenzoyl)-trans1,2-diaminocyclohexane,4 cellulose triacetate,5 and amylose tris(3,5-dimethylphenyl)carbamate (Chiralpak AD).6,7 Neural networks8-10 have recently found application for solving different chemical problems. This results from the development of nonlinear transfer functions and feedback coupling,11 which gave new flexibility to old networks. They have successfully been applied to various areas of chromatography and have demonstrated considerable potential for retention prediction in achiral systems.12-16 Networks are made up of several layers of interconnected neurons, to which pairs of input/output data are presented. The input data consists of a set of pattern vectors, each describing a given solute in terms of nonempirical solute descriptors. The output data are the empirically determined retention factors for the respective solutes. During the training phase, neuron weights are adjusted by repeatedly presenting input/output pairs to the network, until the network has been optimized to give an acceptable final output for a given input set. The network may then be tested with previously unseen input data and the network output compared to the empirically determined results to assess the predictive power of the system. In this article, we report the application of multivariate regression analysis, combined with multilayer feed-forward neural networks trained with error back-propagation, to model the enantioselective chromatographic retention behavior of a series (3) Kaliszan, R.; Noctor, T. A. G.; Wainer, I. W. Mol. Pharmacol. 1992, 42, 512. (4) Altomare, C.; Carotti, A.; Cellamare, S.; Fanelli, F.; Gasparrini, F.; Villani, C.; Carrupt, P-A.; Testa, B. Chirality 1993, 5, 527. (5) Wolf, R. M.; Francotte, E.; Lohmann, D. J. Chem. Soc., Perkin Trans. 2 1988, 893. (6) Booth, T. D.; Wainer, I. W. J. Chromatogr. A 1996, 737, 157. (7) Booth, T. D.; Wainer, I. W. J. Chromatogr. A 1996, 741, 205. (8) McCulloch, W. S.; Pitts, W. Bull. Math. Biophys. 1943, 5, 115. (9) Pitts, W.; McCulloch, W. S. Bull. Math. Biophys. 1947, 9, 127. (10) Hebb, D. O. The Organisation of Behavior; Wiley: New York, 1949. (11) Hopfield, J. J. Proc. Natl. Acad. Sci. U.S.A. 1982, 79, 2554. (12) Glen, R. C.; Rose, V. S.; Lindon, J. C.; Wilson, I. D.; Nicholson, J. K. J. Planar Chromatogr. 1991, 4, 432. (13) Peterson, K. L. Anal. Chem. 1992, 64, 379. (14) Cupid, C.; Nicholson, J. K.; Davis, P.; Ruane, R. J.; Wilson, I. D.; Glen, R. C.; Rose, V. S.; Beddell, J. C.; Lindon, J. C. Chromatographia 1993, 37, 241. (15) Xie, Y. L.; Baeza-Baeza, J. J.; Torres-Lapasio, J. R.; Garcia-Alvarez-Coque, M. C.; Ramis-Ramos, G. Chromatographia 1995, 41, 435. (16) Azzaoui, K. Application des techniques de mode´lisation mole´culaire a` l’e´tude des me´canismes de re´tention en chromatographie liquide. Ph.D. Thesis Universite´ d’Orle´ans, France, 1995.

Analytical Chemistry, Vol. 69, No. 19, October 1, 1997 3879

Table 1. Chiral Acids and Amides Chromatographed on the Three Amylose-Based CSPs no.

empirical formula

1 2 3 4

C13H18O2 C16H14O3 C14H12O3S C16H12ClNO3

5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

C15H13FO2 C9H9ClO3 C9H10O3 C9H9ClO3 C9H10O2 C10H12O2 C11H14O2 C11H14O2 C12H16O2 C12H16O2 C14H20O2 C8H7FO2 C9H10O3 C8H8O3 C9H10O3 C9H10O3 C9H9ClO2 C10H12O3 C11H15NO C12H17NO C13H19NO C14H21NO C15H23NO C14H20ClNO C15H23NO

compound R-methyl-4-(2-methylpropyl)benzeneacetic acid 3-benzoyl-R-methylbenzeneacetic acid R-methyl-4-(thienylcarbonyl)benzeneacetic acid 2-(4-chlorophenyl)-R-methyl-5-benzoxazoleacetic acid 2-fluoro-R-methyl-4-biphenylacetic acid 2-(2-chlorophenoxy)propionic acid 2-phenoxypropionic acid 2-(4-chlorophenoxy)propionic acid 2-phenylpropionic acid 2-phenylbutyric acid 2-phenyl-3-methylbutyric acid 2-phenylpentanoic acid 2-phenyl-4-methylpentanoic acid 2-phenylhexanoic acid 2-phenyloctanoic acid R-fluorophenylacetic acid R-methoxyphenylacetic acid R-hydroxyphenylacetic acid 2-hydroxy-3-phenylpropionic acid 2-phenyl-3-hydroxypropionic acid 2-(4-chlorophenyl)propionic acid 2-(4-methoxyphenyl)propionic acid N-benzyl-2-methylbutylamide N-benzyl-2-methylpentylamide N-benzyl-2-methylhexylamide N-benzyl-2-methylheptylamide N-benzyl-2-methyloctylamide N-(1-methylhexyl)-4-chlorobenzamide N-(1,5-dimethylhexyl)benzamide

of aromatic acids and amides as a function of nonempirical solute descriptors. The solutes were chromatographed on three amylose-based CSPs: the commercially available Chiralpak AD and amylose (S)-phenylethyl carbamate (Chiralpak AS) and the experimental amylose (R)-phenylethyl carbamate (AR). Solute descriptors were first identified using classical QSERR and then transposed to the input layer of the networks. QSERR was able to provide information relating to the fundamental mechanistic interactions, incorporated in the retention and separation of solutes on the three CSPs. However, due to the low predictive ability obtained with multivariate regression, neural networks were applied with the aim of developing enantioselective expert systems.

EXPERIMENTAL SECTION Chemicals. Solute formulas are given in Table 1. Solutes 1, 4, and 11-15 were obtained from Upjohn (Kalamazoo, MI); solutes 3 and 4 from Sigma (St. Louis, MO); solute 5 from Eli Lilly (Indianapolis, IN); solutes 6-10 and 16-20 from Aldrich (Milwaukee, WI); and solutes 21 and 22 from Dr. W. J. Lough (University of Sunderland, Tyne and Wear, UK). All of the amides, solutes 23-29, were prepared according to previously reported procedures.17 Chromatography. Separations were performed on 250 mm × 4.6 mm i.d. columns packed with Chiralpak AD and AS (Chiral Technologies, Exton, PA). The AR CSP packed in a 250 mm × 4.6 mm i.d. column was kindly provided by Prof. Y. Okamoto (University of Nagoya, Japan). Elution orders were determined on a Chiramonitor 2000 optical rotation detector (Interscience, (17) Wainer, I. W.; Alembic, M. C. J. Chromatogr. 1986, 358, 85.

3880

Analytical Chemistry, Vol. 69, No. 19, October 1, 1997

Markham, ON, Canada). Column temperature regulation was achieved using a Haake D1-G refrigerated bath/circulator (Fischer Scientific Ltd., Montreal, PQ, Canada) and a column water jacket. All injections were made using a mobile phase of hexane-2propanol (95:5 v/v) plus 1% trifluoroacetic acid, filtered and degassed. All samples were prepared in mobile phase and run at 27 °C, with a flow rate of 1 mL min-1. Column performance was monitored daily to ensure no variation in column properties. Computational Chemistry. Molecular structures were created with Insight II ver. 235 (Molecular Simulation Inc., San Diego, CA) on an IBM RS6000 RISC. Complete conformational searches were performed on all structures using Search/Compare ver. 2.3 (Molecular Simulation Inc.), and molecular geometries were optimized using the semiempirical molecular orbital program MOPAC 6 (keywords: am1, precise); this program was used also for the calculation of charges, dipole moment, and LUMO energy. The Connolly surfaces18 were calculated using MOLCAD (Tripos Associates Inc., St. Louis, MO) run on a Silicon Graphics Indigo. Specific properties such as molecular electrostatic potential (MEP)19 or molecular lipophilic potential (MLP)20 may be mapped onto the Connolly surface. Software developed at the University of Orleans, France, was used to calculate the maximum, minimum, and average values of MEP and MLP.21 Statistical Analysis and Neural Networks. Statistical analysis and neural network training was performed using TSAR ver. 2.41 (Oxford Molecular Ltd., Oxford, UK). Multivariate regression analyses were run using the nonstandard deviation method with stepwise regression. The leave-one-out cross-validation method was used to estimate the predictive power of regression models. All the data presented to the neural networks were automatically scaled to fall between 0.1 and 1. An initial a weighting value of 1.0 was applied to all connections. Starting weights in the range of -0.03 to +0.03 and -1 to +1 for the initial node biases were selected using the Monte Carlo algorithm. In order to prevent networks from becoming trapped in local minima and to promote faster convergence, the general equation for the correction of weights22 (with momentum) was used. Weights and bias terms were adjusted during training. Parameters start at the maximum values of 0.025 and 0.9 for the learning rate and momentum, respectively, and decay to the minimum values of 0.0001 and 0.0009. RESULTS AND DISCUSSION Retention and solute descriptor data for the 29 racemic compounds chromatographed in this study are presented in Table 2. The most significant molecular descriptors were identified using multivariate regression analysis and a summary of the best equations obtained for the retention of each solute enantiomer on the AD, AS, and AR CSPs is presented in Table 3. Combinations of the same four descriptors were used to describe enantiomer retention on all three CSPs. Five of the resulting equations contained three descriptors, while the (S)-enantiomers on the AD phase required a four-term equation. Good correlation coefficients ranging from 0.938 to 0.979 were obtained for the AS and AR phases, but only borderline values were obtained for the AD phase. (18) Connolly, M. L. Science 1983, 211, 709. (19) Weiner, P. K.; Langridge, R.; Blaney, J. M.; Schaefer, R.; Kollman, P. A. Proc. Natl. Acad. Sci. U.S.A. 1982, 79, 3754. (20) Fauche`re, J-L.; Quarnedon, P.; Kaetterer, L. J. Mol. Graph. 1988, 6, 203. (21) Azzaoui, K.; Morin-Allory, L. Unpublished results. (22) Zupan, J.; Gasteiger, J. Anal. Chim. Acta 1991, 248, 1.

Table 2. Solute Retention Data and Descriptor Values AD

AS

AR

solute descriptors

no.

ln k′R

ln k′S

ln k′R

ln k′S

ln k′R

ln k′S

1

2

3

4

1 2 3 4 5 6b 7b 8b 9 10 11 12 13 14 15 16 17 18 19 20b 21 22 23 24 25 26 27 28 29

0.39 2.14 2.64 2.62 0.91 0.46 0.53 0.83 0.63 0.59 0.44 0.51 0.31 0.44 0.35 0.92 1.01 2.32 1.88 2.51 0.80 1.43 0.99 0.96 0.88 0.82 0.71 0.66 0.30

0.39 2.25 2.49 3.21 1.30 0.66 0.82 1.07 0.72 0.71 0.56 0.60 0.42 0.58 0.54 0.84 1.10 2.16 1.68 2.31 0.90 1.23 0.92 0.82 0.76 0.72 0.63 0.88 0.30

-0.37 1.78 2.49 1.96 0.51 0.45 0.45 nd nd 0.02 -0.27 -0.06 -0.33 -0.07 -0.29 1.42 nd 2.16 1.82 2.01 0.09 0.85 0.68 0.49 0.32 0.16 -0.01 0.77 0.31

-0.36 1.78 2.65 1.64 0.58 0.75 0.71 nd nd -0.04 -0.39 0.05 -0.33 -0.07 -0.29 1.18 nd 2.12 1.82 2.01 0.09 0.85 0.68 0.49 0.32 0.16 -0.01 1.16 0.52

nd 2.06 2.85 1.64 0.69 0.58 0.48 0.48 0.22 0.03 -0.26 -0.02 -0.24 -0.22 -0.14 0.99 1.47 2.20 1.81 2.22 -0.11 0.90 1.17 1.29 1.23 0.64 0.84 nd 2.51

nd 2.06 2.85 1.75 0.69 0.79 0.62 0.57 0.22 0.03 -0.26 -0.02 -0.24 -0.22 -0.14 1.50 1.15 2.13 1.67 2.22 -0.11 0.90 1.17 1.20 1.19 0.36 0.42 nd 2.06

0.198 -0.513 -0.637 -1.015 -0.442 -0.063 0.315 -0.028 0.213 0.216 0.219 0.219 0.231 0.218 0.217 -0.088 0.010 -0.035 0.193 0.152 -0.144 0.242 0.360 0.359 0.360 0.359 0.358 -0.070 -0.067

-11.41 -8.30 -0.71 -7.645 -8.83 -8.35 -10.06 -8.08 -10.95 -11.33 -11.86 -11.37 -11.39 -11.40 -11.35 -7.72 -8.42 -7.01 -9.81 -8.35 -8.58 -11.18 -6.79 -7.12 -7.17 -7.18 -7.08 -13.28 -13.38

0.103 0.098 0.110 0.129 0.119 0.107 0.078 0.116 0.096 0.100 0.100 0.104 0.102 0.103 0.106 0.091 0.067 0.061 0.060 0.050 0.135 0.073 0.082 0.084 0.087 0.089 0.092 0.085 0.086

1.739 2.884 2.681 0.770 0.807 3.288 2.457 2.296 1.690 1.695 1.725 1.679 1.691 1.673 1.666 2.764 1.817 2.228 1.858 2.17 1.872 0.952 3.311 3.297 3.273 3.271 3.257 3.299 3.303

ak′ ) (t - t )/t , where k′ is the enantiomer retention factor, t is the enantiomer retention time, and t is the retention time of an unretained r 0 0 r o solute or solvent front. b Due to apparent (R)/(S) retention inversion (relative to the rest of the series) arising only from the Cahn-Ingold-Prelog naming convention for chiral compounds, enantiomeric retention factors for these compounds were reversed. Solute descriptors: 1, LUMO (energy of the lowest unoccupied molecular orbital); 2, MEP (average molecular electrostatic potential); 3, MLP (average molecular lipohillic potential); 4, DIP (total dipole moment). nd, not determined.

Table 3. Summary of Results Obtained from Multivariate Regression Analysisa AD CSP

f1 t p f2 t p f3 t p f4 t p f5 R RCV s F n

AS CSP

AR CSP

(R)

(S)

(R)

(S)

(R)

(S)

-1.639 -5.71 5.5 × 10-5 0.098 2.64 0.016 -21.84 -4.93 3.1 × 10-4 0

-1.624 -5.66 2.3 × 10-5 0.112 3.12 0.006 -18.57 -4.38 3.6 × 10-5 -0.279 -2.65 0.016 4.539 0.916 0.386 0.339 23.4 23

-2.062 -11.3 2.5 × 10-9 0.114 5.42 4.5×10-5 -33.18 -11.1 3.4 × 10-9 0

-1.955 -11.02 3.7 × 10-9 0.119 5.82 2.0×10-5 -33.20 -11.41 2.2 × 10-9 0

-2.554 -10.1 7.6 × 10-9 0

-2.563 -11.88 6.0 × 10-10 0

4.933 0.971 0.926 0.226 95 21

4.997 0.972 0.777 0.219 96.8 21

-36.43 -8.26 1.6 × 10-7 0.417 4.33 4.0×10-4 3.510 0.938 0.832 0.352 43.7 22

-34.21 -9.08 3.8 × 10-8 0.412 5.02 8.8×10-5 3.299 0.952 0.817 0.301 57.8 22

4.057 0.874 0.166 0.400 20.5 23

a General equation: ln k′ ) f (LUMO) + f (MEP) + f (MLP) + f (DIP) + f . t, t-test; p, significance level; R, correlation coefficient; R , cross1 2 3 4 5 cv validation; s, standard error of the estimate; F, Fischer test; n, number of solutes.

This may be associated with a larger variation in retention factors being observed on the AD phase, which necessitated a more detailed model for adequate description of the empirical data. Most of the terms in the equations carry significance levels in the range 3.1 × 10-4 to 6 × 10-10. Only the MEP and DIPOLE terms in the AD phase regressions have higher, although still acceptable, values.

The complexity of a QSERR model is entirely governed by the size of the initial data set, with the predictive power of the model being determined by cross-validation. Results of leave-one-out cross-validations for each of the six derived relationships are presented in Table 3. Due to the similarities between the three CSPs, it is not surprising that a common set of descriptors was isolated. The Analytical Chemistry, Vol. 69, No. 19, October 1, 1997

3881

Table 4. Summary of the Final Optimized Neural Networks AD CSP

input variables 1 2 3 4 configuration no. of epochs iterations per epoch training RMS test RMS

AS CSP

AR CSP

(R)

(S)

(R)

(S)

(R)

(S)

LUMO MLP MEP

LUMO MLP MEP DIPOLE 4-3-1 1991 500 0.127 0.246

LUMO MLP MEP DIPOLE 4-2-1 2460 2500 0.143 0.153

LUMO MLP MEP DIPOLE 4-2-1 1981 2500 0.173 0.192

LUMO MLP MEP DIPOLE 4-2-1 2482 1000 0.232 0.140

LUMO MLP MEP DIPOLE 4-3-1 2988 2500 0.164 0.178

3-3-1 1600 2500 0.160 0.182

Table 5. Experimental and Network Predicted Retention Factors for Test Solutes Chromatographed on the AD, AS, and AR CSPs AD CSP

AS CSP ln k′S

ln k′R

AR CSP

ln k′R

ln k′S

ln k′R

ln k′S

no.

exp

pred

exp

pred

exp

pred

exp

pred

exp

pred

exp

pred

8 13 18 23 27 28

0.83 0.31 2.32 0.96 0.71 0.66

0.73 0.46 2.11 0.86 0.92 0.35

1.07 0.42 2.16 0.82 0.63 0.88

0.71 0.60 1.66 0.80 0.74 0.42

nd -0.33 2.16 0.49 -0.01 0.77

nd -0.18 2.12 0.47 0.19 0.35

nd -0.33 2.12 0.49 -0.01 1.16

nd -0.17 2.09 0.48 0.15 0.55

0.48 -0.24 2.20 1.30 0.84 nd

0.19 -0.09 2.37 1.05 0.64 nd

0.57 -0.24 2.13 1.20 0.42 nd

0.16 -0.09 1.76 0.99 0.49 nd

Table 6. Results Summary for the Ability of Trained Networks To Predict Output from Previously Unseen Dataa AD CSP

mean standard deviation mean absolute deviation n a

AS CSP

AR CSP

(R) ∆ln k′

(S) ∆ln k′

(R) ∆ln k′

(S) ∆ln k′

(R) ∆ln k′

(S) ∆ln k′

0.06 0.20 0.16 6

0.17 0.30 0.27 6

0.03 0.24 0.16 5

0.06 0.32 0.22 5

0.09 0.22 0.19 5

0.15 0.25 0.21 5

∆ln k′ ) ln k′exptl - ln k′pred.

differences between the phases and, indeed, between enantiomers thus are highlighted by varying combinations and weights of the individual terms. The LUMO and MLP terms are present in all equations. The LUMO descriptor reflects the presence of charge transfer interactions between the solutes and CSP.2 The MLP descriptor and its associated interactions incorporate a combination of lipophilicity with steric and geometric factors.23 The regression coefficients f1 and f3 can be used as interaction strength indicators for the LUMO and MLP descriptors. The absolute values of f1 increase in the order AR > AS > AD, relating to increasing charge transfer interactions in the systems. This is probably due to the conformation of the stationary phase backbone, allowing increased accessibility to hydrogen bond acceptor groups and/or charge transfer sites. It has previously been demonstrated that the LUMO descriptor can be used as a reflection of hydrogen bond donor ability in liquid chromatography.24 For the AS and AR stationary phases, the MLP coefficient values are quite similar. The corresponding absolute value for the AD phase is much lower, indicating reduced interactions between solutes and the CSP. The f2 term for the AS and the f4 term for the AR relate to nonspecific (i.e., general retention) interactions. (23) Azzaoui, K.; Lafosse, M.; Lazar, S.; Thie´ry, V.; Morin-Allory, L. J. Liq. Chromatogr. 1995, 18, 3021. (24) Azzaoui, K.; Morin-Allory, L. Chromatographia 1996, 42, 389.

3882 Analytical Chemistry, Vol. 69, No. 19, October 1, 1997

The regression coefficients for the (R) and (S) equations may be compared to provide an indication of the differences in (R)CSP and (S)-CSP interactions. The nature and magnitude of the interactions are related to the enantioselectivity of the stationary phases. Comparison of the (R) and (S) equations for each stationary phase gives an indication as to the contributions of each descriptor toward enantioselectivity. The largest difference between terms on the AS CSP is observed for f1. A decrease from (R) to (S) of around 5% indicates that enantioselectivity on the AS CSP is influenced to the largest extent by charge transfer interactions. The largest difference between terms on the AR CSP is observed for f3. A decrease from (R) to (S) of around 6% indicates that enantioselectivity on the AR CSP is influenced to the largest extent by lipophilic and steric interactions. As the AS and AR CSPs differ only in the configuration of one stereogenic center, this result would indicate that major conformational differences exist between the two phases. In contrast to the order proposed in the AD CSP,6 the AS and AR CSPs are believed to exhibit substantial disorder resulting from the nonlinearity of the chiral phenylethyl derivatives. This disorder serves to alter the chiral binding environment, thus permitting varying types and magnitudes of interactions leading to enantioselectivity. These findings are consistent with

Table 7. Results of Experimental versus Network Output Linear Regressiona AD CSP

R RCV s F n test n train n

AS CSP

AR CSP

(R)

(S)

(R)

(S)

(R)

(S)

0.988 0.973 0.119 1060 29 6 23

0.974 0.942 0.168 494 29 6 23

0.988 0.969 0.136 1011 26 5 21

0.980 0.958 0.176 575 26 5 21

0.980 0.956 0.186 605 27 5 22

0.984 0.964 0.164 741 27 5 22

a R, correlation coefficient; R , cross-validation, standard error of cv the estimate; F, Fischer test; n, total number of solutes; test n, number of solutes in the test set; train n, number of solutes used for training the networks.

Table 8. Experimental and Predicted Separation Values and Elution Orders for Test Solutes Chromatographed on the AD, AS, and AR CSPsa AD CSP exp

AS CSP

pred

exp

AR CSP

pred

exp

pred

no.

R

conf

R

conf

R

conf

R

conf

R

conf

R

conf

8 13 18 23 27 28

1.27 1.12 1.17 1.15 1.09 1.24

R R S S S R

1.02 1.15 1.57 1.06 1.20 1.07

S R S S S R

nd 1.00 1.04 1.00 1.00 1.47

nd na S na na R

nd 1.00 1.03 1.01 1.04 1.22

nd na S na S R

1.09 1.00 1.07 1.10 1.53 nd

R na S S S nd

1.03 1.00 1.84 1.06 1.17 nd

S na S S S nd

a R, chromatographic separation; conf, configuration of the firsteluting enantiomer; na, not applicable due to no separation; nd, not determined.

previously reported results.25 The major factor influencing enantioselectivity on the AD CSP for this series of compounds would appear to be the dipole term, f4. Although the f2 and f3 terms differ by almost equivalent amounts, they differ in opposite directions, thus effectively canceling any enantioselectivity. The f4 term is present in only one equation, thus giving substantial differentiation between the two equations and indicating that this phase, in general, is the more selective of the three. In five out of the six regression equations, cross-validation results indicate that the predictive ability of the models is low. Only the equation derived for (R)-enantiomers on the AS CSP demonstrated any degree of predictive power. Back-propagation neural networks were trained using data from the same compounds as those used for the multivariate regression. Optimal network architecture was determined using a limit of four neurons in the input layer, corresponding to the set of common solute descriptors. Networks were trained for the retention of (S)- and (R)enantiomers on all three CSPs. A summary of the final optimized (25) Booth, T. D.; Lough, W. J.; Saeed, M.; Noctor, T. A. G.; Wainer, I. W. Chirality 1997, 9, 173.

networks is given in Table 4. All networks were trained in under 3000 epochs, with the optimal number of iterations per epoch ranging from 500 to 2500. Training was considered to be completed when no improvement in the RMS output was observed over 400 epochs. A test set of previously unseen acids and amides, approximately equal in size to 25% of the initial sample set, was chosen to probe the predictive properties of the trained networks. The appropriate descriptor data for the test compounds were calculated and presented to the networks. Results for retention factor prediction of test solutes are given in Tables 5 and 6. Linear regression was performed between the experimental retention factors and the network training outputs (including predicted values from the test compounds), and the results are shown in Table 7. Leave-one-out cross-validation results indicate that the networks seem to be quite effective in modeling the structure-retention relationships, with a view to predicting enantioselective chromatographic retention. Table 8 gives a comparison of the experimental and predicted separation values, along with the associated elution orders. For 11 out of the 16 test compounds (all three CSPs), predicted R values are quite close to the experimentally determined values. In all cases where no separation was predicted, none was found experimentally. Separation was predicted only once (solute 27 on the AS CSP), where none was found experimentally; however, the actual value of the predicted separation is very low. The predicted elution orders are incorrect in only two cases (solute 8 on the AD and AR CSPs). CONCLUSIONS A combined multivariate regression/neural network approach has been successfully used to develop expert systems capable of prediction and description of enantioselective separations on three amylosic CSPs. Neural networks have been shown to be capable of much higher predictive power than multivariate regression, although the descriptive power of the networks is limited and must be compensated by the regression equations. Further application of this combined approach should generate a better understanding of enantioselective processes, as well as provide a valuable chromatographic tool. ACKNOWLEDGMENT The authors thank Chiral Technologies Inc., Exton, PA, for financial support and Prof. L. Morin-Allory for allowing access to software developed at the University of Orleans. This work is also supported in part by an NSERC grant awarded to I.W.W. Received for review February 24, 1997. Accepted June 16, 1997.X AC9702150 X

Abstract published in Advance ACS Abstracts, August 1, 1997.

Analytical Chemistry, Vol. 69, No. 19, October 1, 1997

3883