Prediction of gas chromatographic relative retention times of

Ovidiu Ivanciuc, Teodora Ivanciuc, Daniel Cabrol-Bass, and Alexandru T. Balaban. Journal of Chemical Information and Computer Sciences 2000 40 (3), 73...
1 downloads 0 Views 512KB Size
Anal. Chem. 1991, 63,2021-2024

2021

Prediction of Gas Chromatographic Relative Retention Times of Stimulants and Narcotics C. G. Georgakopoulos and J. C. Kiburis Doping Control Laboratory, The Olympic Athletic Center of Athens, Kifissias 37, 15123 Maroussi, Athens, Greece

P. C. Jurs* Department of Chemistry, The Pennsylvania State University, 152 Davey Laboratory, University Park, Pennsylvania 16802

The ADAPT software system was used to create models for the predlctlon d gas chromatographlc relatlve retention tlmes (RRTs) ol dhulsnk and namtks that are analyzed In dopins control d atMe8. The two maln metthat were followed for bulldlng the models were the quantltatlve structure-retentlon relatlonshlp (QSRR) and multlple llnear regresslon analydr. The maln proposed model for the entire data set had a mdlpk corrdatbn codfkknt R = 0.991 and standard error 8 = 0.046 or approxlmately 4.5%. Because of the relatlvely hlgh standard error of the maln model, a second model was built on a subset of compounds wlth R = 0.982 and 8 = 0.027 or approxlmately 2.5 % .

INTRODUCTION One of the major classes of athletic doping agents is composed of stimulants and narcotics. These drugs are analyzed, after extraction from urine, by a two-step procedure: (a) capillary gas chromatographic analysis and preliminary identification of unknown peaks after comparison with standards and (b), in the case of a positive sample, GC/MS analysis following analysis by a. It is of great importance that the first screening procedure be complete and accurate to conserve effort and cost. Therefore, large collections of standard compounds are required. The existence of a model for predicting relative retention times (RRTs) based on structural information would assist in the interpretation of analytical data and supplement collections of standards. T h e general process of relating molecular structures with chemical, physicochemical, and biological activities is called quantitative structure-activity relationship (QSAR) (1). A branch of QSAR, QSRR (quantitative structure-retention relationship), is based on relating structures with their chromatographic retention parameters (2).This goal has been approached successfully in the past for many chromatographic systems and compound classes (3-9). From those studies, some inferences can be extraded in relation to what conditions must be fulfilled by the data set so that in building a chromatographic statistical model, it would have errors of the same magnitude as the experimental error. The three main requirements are the availability of the descriptors that account for the retention phenomenon, the homogeneity of the molecular structures in the set, and adequate number of compounds in the data set. Based on that approach, a set of topological, geometric, electronic, and physical molecular structure descriptors were generated. Multiple linear regression was used to analyze the data.

EXPERIMENTAL SECTION In thisstudy, a four-stage procedure waa followed (a) capillary gas chromatographic analysis of the stimulants and the narcotics by the Doping Control Laboratory of Athens, (b) molecular 0003-2700/91/0363-2021$02.50/0

structure entry and storage, (c) molecular descriptor generation, and (d) statistical analysis. All the computations were performed on a Sun 4/110 Workstation at The Pennsylvania state University by using the ADAPTsoftware system (10-15). Data Set. The data set included 57 stimulants, narcotics, and some metabolites (Table I), mainly standard compounds. For those substances that were analyzed from urine, a simple extraction procedure was followed. A detailed description of doping analysis is found in ref 16. The internal standard of the analysis was diphenylamine. The instrument used was a HP5890 gas chromatograph, coupled with a nitrogen-phosphorus detector (IWD). All of the interesting compounds, for this analysis,contain nitrogen. Other experimental conditions were column HP Ultra 1 (poly(dimethylsiloxaane)), length 12.5 m, internal diameter 0.2 mm, film thickness 0.33 pm, nitrogen carrier gas, split ratio k10, injection volume 2 rL,port temperature 250 “C, detector temperature 300 “C, and temperature program 50 “C/2 min, 25 OC/min, 250 “C/5 min. The total run time was 15 min. The experimental RRT error was approximately 0.005. The range of RRTs was from 0.608 to 1.966, and the range of MWs was from 131 (heptaminol) to 341 (fenethylline). Structure Entry and Modeling. The molecular structures were entered in the ADAPT software system following a two-stage procedure: (a) sketching them as hydrogen-suppressed diagrams on a graphics terminal (12-15) and storing them as connection and distance tables; (b) minimizing the strain energy of each structure, to improve its geometrical description, by correcting ita bond lengths and angles with two molecular mechanics algorithms: MM2 (17, 18) and AM1 (19). Descriptor Generation. The third step in developing the model was the numerical description of the molecular structures. Molecular descriptors are measured or calculated values that attempt to quantitatively encode important features of the structure of a particular compound. This information may be topological, geometric, electronic, and physical. A total of 175 descriptors were calculated for each compound in the data set. Topological descriptors include fragment, molecular connectivity, K indexes, and path descriptors. Fragment descriptors, which can be calculated by a simple inspection of the molecular structure, are the counts of atoms, heteroatoms, bonds, rings, substructures, pairs of electrons, etc. In this set of descriptors, molecular weight is included. Molecular connectivity descriptors encode information about the size and the degree of branching in a molecule (2,20-22). Topological descriptors are calculated from the stored connection tables. K indexes (23)encode topological molecular shape by using a graph theoretical approach. According to that theory, each structure is depicted as a graph consisting of nodes (atoms) and edges (bonds). Path descriptors are based on the same theory (24). A path is defined as an alternating sequence of nodes and edges that begins and ends with a node. For the calculationof the geometric descriptors, the prior strain energy minimization of the molecular structures and the calculation of the three-dimensional coordinates are required. These descriptors include principal moments of inertia (B), van der Waals molecular volume (26),length-to-breath ratio (15,27),the principal axes of the molecule (“length”,“width”,and “thickness” of the molecule, calculated after the contribution of the coordinatea and the atomic mass of each atom) and their ratios (15),structural 0 1991 Amerlcan Chemical Soclety

2022

ANALYTICAL CHEMISTRY, VOL. 63, NO. 18, SEPTEMBER 15, 1991

Table I. Set of Compounds Used in the Models

drug 1

2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

12. 13. 14. 15. 16. 17. 18.

19. 20. 21.

22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50. 51. 52. 53. 54. 55. 56. 57.

heptaminol amphetamine phentermine propylhexedrine methylamphetamine ethylamphetamine fenfluramine dimethylamphetamine mephentermine indole norpseudoephedrine methylethylamphetamine bemegride nicotine chlorphentermine ephedrine methoxyphenamine methylephedrine phenmetrazine phendimetrazine etaphedrine amphepramone mefenorex diphenylamine desethylfencamfamin prolintane cotinine phenacetin crotethamide fencamfamin cropropamide amphetaminil pethidine meclofenoxate norpethidine caffeine benzphetamine ethamivan pyrovalerone doxylamine theophylline chlorpheniramine naphazoline phacetoperane dextromethorphan pipradrol methadone levorphanol pentazocine mazindole codeine dihydrocodeine hydrocodone dimefline phenazocine fentanyl fenethvlline

RRT calcd exptl model I model I1 0.608 0.621 0.657 0.675 0.679 0.727 0.734 0.738 0.744 0.756 0.785 0.792 0.798 0.811 0.822 0.830 0.840

0.865 0.886 0.906 0.920 0.934 0.987 LOO0

1.008 1.023 1.040 1.042 1.044 1.056 1.078 1.101 1.101 1.106 1.113 1.127 1.129 1.165 1.175 1.200 1.205 1.251 1.253 1.259 1.326 1.330 1.332 1.367 1.403 1.435 1.471 1.475 1.515 1.778 1.805 1.833 1.966

0.633 0.704 0.720 0.648 0.710 0.772 0.762 0.699 0.705 0.689 0.828 0.770 0.826 0.808 0.775 0.836 0.814 0.834 0.881 0.924 0.900 0.955 0.903 1.032 0.964 1.051 1.006 0.972 1.028 1.042 1.090 1.097 1.101 1.207 1.097 1.141 1.208

0.641 0.665 0.671 0.719 0.799 0.703 0.737 0.818 0.788 0.820 0.824 0.811 0.847 0.885 0.914 0.955 0.898

1.043

1.100

1.181

1.202 1.221 1.172 1.289 1.250 1.246 1.288 1.282

1.394 1.315 1.441 1.381 1.558 1.530 1.514 1.711 1.738 1.806

1.973

symmetry descriptors (15),and solvent-accessible surface areas and volumes of the molecule (28). Electronic descriptors encode information about the electronic environment of each molecule. This class includes descriptors that encode information about Huckel molecular orbital calculations (interactions of valence T electrons with adjacent atoms) (15),partial atomic charges, and electronic dipole moments (29, 30). Also,other descriptors include information about u electron density, interatomic distance between atoms with the most poeitive and moat negative u charges, and the sum of the absolute values of all atomic u charges in the structure (31). The computation of a set of newly derived descriptors (CPSA) (32)was also performed. These descriptors encode information about polar intermolecular interactions. For this calculation,

Table 11. Regression Model I for 57 Stimulants and Narcotics variable"

regress coeff

std error of regress coeff

partial F

DPSA3 NBND MOLCB V6C S6C ETOT intercept

0.007 14 0.04369 -0.093 95 4.589 91 -2.061 21 0.016 76 -0,12898

O.OO0 58 0.002 98 0.01897 0.823 14 0.236 63 0.002 07 0.027 71

149.252 215.466 24.529 31.093 75.873 65.553 21.660

R = 0.991

n = 57

s = 0.046

F(6,50) = 444.1

DPSA3, the difference of two sums: the sum of the multiplications of the partial positive surface areas with atomic charges for all the atoms of the molecule and the respective sum for the negative. It belongs to the CPSA group. NBND,number of bonds. MOLCB, the sum of the fourth-order path cluster indexes (see ref 15). V6C, the sum of the sixth-order cluster indexes using valences in the computation. S6C, the sum of the sixth-order cluster indexes. ETOT, a sum of the energies of all the highest and lowest occupied molecular orbitals. a

information about the solvent-accessible surface and the electron distribution of a charged surface is required. Finally, two physical descriptors were calculated the wholemolecule molar refraction value using the fragment additivity method developed by Vogel(33), and the molecular polarizability (34). Statistical Analysis. Statistical analysis was performed in three stages: (a) descriptor elimination, (b) model generation, and (c) model evaluation. The descriptor elimination was implemented after successive application of various criteria. Descriptors with less than 10% non-zero values were removed. Descriptors with over 80% identical values were also rejected. All the remaining descriptors were examined to see if transformations improved the correlation with the dependent variable. About 20 of the descriptors were transformed to their square or square root values. Pairwise correlationswere examined, and one of each pair exceeding 0.95 was removed. Vector space analysis, using the Gram-Smidt orthogonalization method, was performed to examine multicollinearitiesamong the descriptors. The initial basis was the descriptor that counts the number of bonds. This descriptor had a correlation coefficient with a dependent variable of 0.937, which was the largest among the entire set. By using these procedures, 140 descriptors were excluded. The remaining pool of 35 descriptors was analyzed by multiple linear regression analysis (35,361. Starting with the most highly correlated descriptor, a procedure of adding descriptors stepwise was accomplished based on F-to-enter values. Then a deletion procedure followed based on F-to-delete values, where each combination of one, two, etc., descriptors were held out in turn (37). In each addition or deletion procedure, a new model was generated. This method generates the best models found by regression analysis. The last step involved model validation. For this purpose, several criteria were taken into consideration: (a) multiple correlation coefficient R, (b) standard error s, (c) overall F value for analysis of variance, (d) the number of descripttors that were included in the model, and (e) the multicollinearitiesbetween the descriptors. The test for multicollinearities was performed by holding each descriptor out in turn and using the remaining descriptors in the model equation as independent variables in a regression analysis attempting to predict the descriptors held out. Variance inflation factors (VIFs) (36) for each descriptor and a mean VIF for each equation were also generated.

RESULTS A N D DISCUSSION The best equation developed by the procedure described above is presented in Table II. The statistics associated with the model are R = 0.991 and s = 0.046 (approximately 4% of the mean RRT). The calculated vs experimental RRTs are plotted in Figure 1,and the residuals vs the calculated RRTs are plotted in Figure 2. The mean VIF for the model was

ANALYTICAL CHEMISTRY, VOL. 63, NO. 16, SEPTEMBER 15, 1991 2.00

~

N

1.85 -

-N

2023

/

1.70

k -

1.55 1.40

11.25

amphetamine

1

dimethylamphelamine

:

3 1.10 U

0.80 0.95

ephedrine

0.65 0.50 L/'

'

'

'

0.80

0.50 0.65

0.95

1.10

1.25

1.40

Observed RRT

1.55

1.70

1.85

2.00

Figure 1. Plot of calculated vs observed relative retention times for 57 stimulants and narcotics. 0.110 T

. . T

T

0.066 0.044 y 0.022

29

T T

'

-0.066-0.088 -

.

'T

.. .

?

T

?

T

?

'

* T

T

T

TTT

T 7 .

0.000 --

-0.044 -

T T.

0

a -0.022

T

T

'TT

"

T

T T

T

Tv

.

T

TT

mecldenoxate

Fl~ure3. Sample of molecular structures of stimulants Included In the data set of model 11. opments of the relation between chromatographic retention and molecular connectivity are found in ref 2 and 22. From the remaining descriptors of Table 11, the DPSA3 descriptor belongs to the CPSA class of descriptors (32). These descriptors, as mentioned above, encode features of molecules that may be responsible for polar interactions. Actually, in the compounds of the data set, many polar groups exist: mainly primary and secondary amines and hydroxy groups. The last descriptor, ETOT, belongs to the class of decriptors that are based on Huckel molecular orbital calculations (15). They provide useful information related to properties and reactivities of organic A electron systems. Both DPSA3 and ETOT are electronic descriptors. These descriptors are related to the electron pair donor-acceptor forces of the chromatographic retention phenomenon. In an attempt to examine the extent of the participation of this fact to the present system, the generation of a model based on a subset of compounds was undertaken. In the subset, compounds without polar groups were included. This trial failed to generate a model because of the insufficiency of the subset. Next, the opposite experiment was performed, i.e., the generation of a model for the 57 compounds based only on electronic descriptors. The statistics of that model are R = 0.963, n = 57, s = 0.089, F(3,53) = 224.8, and mean VIF = 1.5. The descriptors are ETOT and two CPSAs. From the last two paragraphs, the general conclusion is that topological and electronic descriptors are quite useful in QSRR studies. Although, the multiple correlation coefficient of the proposed model I is good, the standard error is somewhat high (Table 11). This may be due to the reasons referred to in the Introduction section. To examine the possibility of reducing the standard error of the prediction of the RRTs by model I, a subset of stimulants with structures similar to those presented in Figure 3 was created. The compounds of that subset are indicated in the last column of Table I. Model I1 (mean VIF