Application of Information Theory and Numerical Taxonomy to the Selection of Gas-Liquid Chromatography Stationary Phases Arie Eskes, Foppe Dupuis, and Auke Dijkstra Analytisch Chemisch Laboratorium, Rijks Universiteit te Utrecht, Croesestraat 77A,Utrecht, The Netherlands
Henri D e Clercq and Desire L. Massart Farmaceutisch Instituut, Vrije Universiteit Brussel, Paardenstraat 67, B- 1640 Sint Genesius Rode, Belgium
By using Information theory and numerical taxonomy, the selection of GLC stationary phases from a set of 16 has been discussed. The results obtalned by applylng these techniques are very simllar when the selectlon is based in both cases on correlation coefficients. Thls is due to the fact that the information obtained when using a combination of columns appears to be determined largely by the correlation between these columns. The set of columns selected depends on the set of compounds to which the selection procedure is applied. Results are glven for a general set of 248 compounds and subsets of aliphatic alcohols, aldehydes, ketones, and esters. In all cases, one apolar stationary phase in combination with one or more polar phases was selected. The numerical taxonomic and lnformatlon theoretical selection procedures are compared and their respective advantages and disadvantages discussed. It Is concluded that both techniques are useful mathematical tools for the classification and combination of chromatographic techniques.
The number of stationary liquid phases (columns) available for use in gas liquid chromatography (GLC) is very large. One of the problems a separation chemist is confronted with, is to make a choice from this large number. Quite often, the columns are selected from those available in the laboratory. Use is made of acquired experience. Of course, this choice is not always optimal. There appears to be a need for objective methods of standardization, classification, or selection, This necessity has been pointed out by several authors (1-6). In 1966, Rohrschneider (7) gave a definition of the polarity and presented a method to calculate this polarity from the retention indices of five substances. This allows a characterization of stationary phases. McReynolds ( I ) used the same approach to characterize 226 columns. I t appeared that many of these columns are very much alike. Weiner and Parcher (8) showed that factors obtained from a factor analysis yield a tool for characterization, and Wold (9) indicated that classification is possible by making use of principal component analysis. Although a characterization scheme can be used to select stationary phases by making use of rules such as “an apolar column will be suitable for the separation of apolar substances”, the need for more sophisticated classification procedures has been expressed, for example by Leary et al. ( 3 ) .These authors used a nearest neighbor technique to select a number of liquid phases from McReynolds compilation ( 1 ) . More formal classification techniques were described by Massart and coworkers (IO),who used numerical taxonomy to classify 226 liquid phases from McReynolds ( 1 ) compilation into groups of similar stationary phases and by Wold (11) who applied a pattern recognition technique on the same set of liquid phases. Moffat et al. (12) introduced the concept of dis2166
criminating power for a series of chromatographic systems in terms of the probability of separating two compounds, selected a t random from a specific population, in a t least one of the systems. In fact, this is a measure of the information obtained by combining several chromatographic systems. Dupuis and Dijkstra (13)used information theory to select an optimal set of columns for retrieval purposes. This technique takes into account the distribution of the retention indices and eliminates the correlations between the retention indices on the several columns. I t was found (IO)that numerical taxonomy yields similar results to the selection procedure described by Dupuis (13) in the case of the selection of a set of preferred phases from a set of 10. I t seemed interesting to study the relationship between both techniques more in detail. In the present paper, therefore, both numerical taxonomy and information theory have been applied to select a few stationary phases from a set of 16, using different sets of retention indices of compounds, e.g., a general set and sets of alcohols, esters, and aldehydedketones. In order to study the applicability of the selection and classification procedures, columns for general use as well as columns for special purposes have been included. An answer to the problem of which columns and how many columns are to be recommended for general or special purposes requires a more elaborate study. This requires retention data for more columns and probably the introduction of other selection criteria such as stability, availability, and temperature range of the stationary phases ( 1 4 ) .
THEORY T o solve the problem of choosing the best combination of stationary phases in GLC, two methods have been applied, information theory and numerical taxonomy. The first method has been described in detail by Dupuis and Dijkstra (13).The second method has been applied by Massart et al. to GLC (10) and to thin layer chromatography ( 1 5 ) . The information theoretical approach can be summarized as follows. If one stationary phase is to be selected, the column with the best separation is chosen or, in other words, the column with the widest distribution (the largest standard deviation) of retention indices. If the distribution of retention indices on a stationary phase can be approximated by a normal (Gaussian) distribution, the amount of information in bits can be calculated with Equation 1 derived from Shannon’s equation (16).
where Id stands for logn, and urn2 and ue2 are variances of the distribution function of the retention indicees and that of the errors, respectively. The column with the largest standard deviation yields the maximum amount of infor-
ANALYTICAL CHEMISTRY, VOL. 47, NO. 13, NOVEMBER 1975
mation, assuming that the errors are equal for each column. If the amount of information obtained from one index measurement is not sufficient (e.g., for a retrieval procedure), retention indices measured on other columns can supply additional information. In case of normal distributions, the amount of information obtained from n columns together is given by
where I coq is the determinant of the covariance matrix of the retention indices and Icoqe that of the errors. The covariance matrix is defined as
cov
(3)
where uii = ui2 is the variance of the retention indices on column i and uij is the covariance of the indices for columns i and j and n is the number of columns. Variances and covariances are related to the correlation coefficient p i j via the equation (4)
Obviously the set of' n columns producing the maximum amount of information corresponds with a maximum value of the n X n determinant of the covariance matrix for the columns selected. In the study of Massart (10) about the classification of liquid phases in GLC by numerical taxonomy, the taxonomic distance was used as a measure for the resemblance of the stationary phases. This distance i s defined as
where RIik and RI,k are the retention indices of compound k on columns i and j and N is the number of retention indices taken into account. In the comparison between the selection of phases by the information theoretical approach and the classification by numerical taxonomy presented in this study, correlation coefficients (Equation 4) as well as taxonomic distances (Equation 5 ) are used as similarity coefficients. In the classification by taxonomy, a n X n resemblance-matrix containing either the correlation coefficients or taxonomic distances is constructed. The reduction of this matrix is carried out by a weighted pair group method using the arithmetic average (17). The smallest distance Di, or highest correlation coefficient p i , is selected: i and j are the most similar liquid phases and are considered to form one group i f . The similarity coefficient between the new group i f and all other phases (for example I ) is then calculated as follows (for instance for the distances):
The total number of rows and columns in the resemblance matrix is therefore reduced by one. This process is repeated until all liquid phases are classified in one non-overlapping hierarchic system of groups and subgroups. The last reduction links the two most different clusters, whereas the reduction before last links the three most different clusters. To obtain the final set of, for example, 3
Table I. List of Stationary Phases Column No.
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Stationary phase
Squalane Apiezon L SE-30 Diisodecyl phthalate Polyphenyl ether-6 rings Bis(ethoxyethy1) phthalate Carbowax 20M Diethyl glycol succinate Tricresyl phosphate Diglycerol Zonyl E7 QF-1 Hyprose SP80 Triton X-305 XF 1150 Quadrol
preferred phases from the original set, a selection will be made from each of these three groups.
CALCULATIONS Information theory and numerical taxonomy have been applied to four data sets. The data sets were taken from the compilation of McReynolds (18). Sixteen stationary phases were considered. These phases have been listed in Table I. The stationary phases cover a wide polarity range. The principal object of this article is to evaluate the two selection procedures. Therefore, a small and more or less arbitrary subset including stationary phases for general use and special purposes was chosen from the compilation in question. A complete classification by numerical taxonomy of 226 phases has been published by one of us (10). A sample of retention indices of 248 compounds on 16 stationary phases was taken (whole library). This sample is composed of a wide variety of compounds which are listed in Table 11. From this data set of 248 compounds, three subsets were taken, viz., a subset of 48 alcohols, a subset of 35 aldehydedketones, and one of 60 esters (Nos. 1-48, 4983, and 84-143 in Table 11, respectively). For each set of indices estimates si2, si;, and rij of the variances ui2, covariance uij, and correlation coefficient pi, were calculated in the usual way. Using the values of s 2 ,the information per column was calculated with Equation 1. A value of 2 index units as an estimate of the standard deviation of the error distribution was used. It has been assumed that the distribution of retention indices on each column can be approximated by a normal (Gaussian) distribution. This approximation has been tested by using the x2 test (19). This test showed that this assumption was justified. As has been indicated, the amount of information obtained from n columns can be calculated with Equation 2. The set of n columns which yields a maximum amount of information will be the set for which the determinant of the covardeterminants have iance matrix is maximal. In our case to be calculated. This is done for n = 2 and n = 3. For n = 4 and n = 5 , the selection procedure described by Dupuis and Dijkstra (13) has been applied. In order to apply numerical taxonomy in addition to the correlation coefficients, taxonomic distances were calculated using Equation 5. The reduction of the resemblance matrix was carried out as described in the previous section. The calculations were performed on the CDC 73/26 computer of the Academic Computer Centre of the State University of Utrecht and on the CDC 6400 computer of the
('t)
ANALYTICAL CHEMISTRY, VOL. 47. NO. 13, NOVEMBER 1975
2169
Table 11. List of Compounds Compound
NO.
1 Methanol
2 3 4 5 6 7 8 9 10 11 12 13 14
Ethanol Propanol Isopropanol Butanol Isobutanol sec-Butanol tert-Butanol Cyclopropylcarbinol Pentanol 2-Pentanol 3-Pentanol 3-Methyl-2-butanol 2,2-Dimethyl- 1propanol 15 Hexanol 16 2- Methyl- 1-pent an01 17 3- Methyl- 1-pent an01 18 4-Methyl- 1-pentanol 19 2-Methyl-2-pentanol 20 3-Methyl-2-pentanol 2 1 4- Methyl- 2-pentanol 22 2- Methyl- 3- pent an01 23 3-Methyl-3-pentanol 24 2-Ethyl- 1-butanol 25 2,2-Dimethyl- 1butanol 26 2,3-Dimethyl- 2butanol 27 3, 3-Dimethyl-2butanol 28 Heptanol 29 2-Heptanol 30 3-Heptanol 31 4-Heptanol 32 2,2-Dimethyl-1pentanol 33 2,4-Dimethyl-3pent an01 34 Octanol 35 2-Octanol 36 2-Ethyl- 1-hexanol 37 Cyclopentanol 38 Cyclohexanol 39 2- Propen- 1-01 40 2- Propyn- 1-01 41 2-Buten-1-01 42 3-Buten-2-01 43 2-Methyl-2-propen-101 44 45 46 47 48 49 50 51 52 53 54 55
3-Penten-1-01 1-Penten-3-01 1- Penten-4-01 2-Methyl-3-buten2-01 2- Methyl- 3-butyn2- 01 Formaldehyde Propionaldehyde Butyraldehyde Isobutyraldehyde Valeraldehyde Isovaleraldehyde 2-Methylbutyraldehyde
2170
KO,
Compound
56 2,2-Dimethylpropionalde hyde 57 Hexanal 58 Heptanal 59 2-Ethylhexanal 60 Acrolein 61 Methacrolein 62 Crotonaldehyde 63 2-Ethyl-2-butenal 64 2-Ethyl-2-hexanal 65 Acetone 66 2-Butanone 67 2- Pentanone 68 3-Pentanone 69 3-Methyl-2-butanone 70 2-Hexanone 71 3-Hexanone 72 3-Methyl- 2- pent anone 73 4-Methyl-2-pentanone 74 3,3-Dimethyl-2butanone 75 2-Heptanone 76 4,4-Dimethyl-2pentanone 77 2,4-Dimethyl-3pent anone 78 2-Octanone 79 Cyclohexanone 80 5- Hexen- 2-one 81 2,3-Butanedione 82 2,3- Pentanedione 83 2,4- Pentanedione 84 Methyl formate 85 Ethyl formate 86 Propyl formate 87 Isopropyl formate 88 Butyl formate 89 Isobutyl formate 90 sec-Butyl formate 91 2- Pentyl formate 92 3-Pentyl formate 93 Hexyl formate 94 Allyl formate 95 Methyl acetate 96 Ethyl acetate 97 Propyl acetate 98 Isopropyl acetate 99 Butyl acetate 100 Isobutyl acetate 101 sec-Butyl acetate 102 tert-Butyl acetate 103 Pentyl acetate 104 Isopentyl acetate 105 2-Pentyl acetate 106 3-Pentyl acetate 107 2-Methyl-2-butyl acetate 108 Hexyl acetate 109 4-Methyl-2-pentyl acetate 110 2-Ethyl-l-butylacetate 111 Heptyl acetate 112 Cyclohexyl acetate 113 Vinyl acetate
NO.
Compound
114 Allyl acetate 115 Isopropenyl acetate 116 Methylene diacetate 117 Ethylidene diacetate 118 Ethylene diacetate 119 Methyl propionate 120 Ethyl propionate 121 Propyl propionate 122 Isopropyl propionate 123 Butyl propionate 124 Pentyl propionate 125 Isopentyl propionate 126 2-Pentyl propionate 127 Methyl butyrate 128 Ethyl butyrate 129 Propyl butyrate 130 Isopropyl butyrate 131 Butyl butyrate 132 Isobutyl butyrate 133 Pentyl butyrate 134 Isopentyl butyrate 135 Vinyl butyrate 136 Methyl isobutyrate 137 Butyl isobutyrate 138 Isobutyl isobutyrate 139 Methyl acrylate 140 Ethyl acrylate 141 Propyl acrylate 142 Butyl acrylate 143 Allyl acrylate 144 Methylal 145 Ethyl methyl formal 146 Isopropyl methyl formal 147 Diethyl formal 148 Propyl ethyl formal 149 Isopropyl ethyl formal 150 sec-Butyl ethyl formal 151 Dipropyl formal 152 sec-Butyl propyl formal 153 Diisopropyl formal 154 Dibutyl formal 155 Diisobutyl formal 156 Di-sec-butyl formal 157 Ethylene glycol formal 158 1,2-Propylene glycol formal 159 1,3-Propylene glycol formal 160 1,3-Butylene glycol formal 161 2,3-Butylene glycol formal 162 1,4-Butylene glycol formal 163 Neopentyl glycol formal 164 6-Methyl- 2,4,7-trioxaoctane 165 Dimethyl acetal 166 Diethyl acetal
ANALYTICAL CHEMISTRY, VOL. 47, NO. 13, NOVEMBER 1975
ho.
Compound
167 Dipropyl acetal 168 Diisobutyl acetal 169 Ethylene glycol acetal 170 1,3-Butylene glycol acetal 171 Dimethyl butyral 172 1,3-Butylene glycol butyral 173 Acrolein diethyl acetal 174 4,4 - Dim ethoxy- 2 but anon e 175 Methyl ether 176 Propyl methyl ether 177 Butyl methyl ether 178 tert-Butyl methyl ether 179 Ethyl ether 180 Butyl ethyl ether 181 tert-Butyl ethyl ether 182 Allyl ethyl ether 183 Propyl ether 184 Isopropyl propyl 185 Isopropyl ether 186 tert-Butyl isopropyl ether 187 Butyl ether 188 Pentyl ether 189 Isopentyl ether 190 2-Ethyl-1-butyl ether 191 Ethyl vinyl ether 192 Butyl vinyl ether 193 Isobutyl vinyl ether 194 2-Ethyl- 1- hexyl vinyl ether 195 Tetrahydrofuran 196 2-Methyl- 1,2-propylene oxide 197 2-Methyl tetrahydrofuran 198 Furan 199 2-Methylfuran 200 2,5-Dimethyl tetrahydrofur an 201 3,4-Dihydropyran 202 Tetrahydropyran 203 Ethane 204 Butane 205 Hexane 206 Octane 207 Decane 208 Dodecane 209 Tetradecane 210 Hexadecane 211 Octadecane 212 Benzene 213 Toluene 214 o-Xylene 215 nz-Xylene 216 p-Xylene 217 Ethylbenzene 2 18 o-Diethylbenzene 219 m-Diethylbenzene 220 p- Diet hylbenzene 2 2 1 Methylene chloride
Table I1 (Continued) No.
No.
Compound
222 Ethyiene chloride 223 Chloroform 224 3-Hydroxy-2-butanone 225 4-Hydroxy-2-butanone 2 2 6 1- Hydroxy- 2- methyl3- but anone 2 27 3 Met hox ybut amal
-
Compound
Compound
KO.
242 3-McthOXy-l-butyl acrylate 243 Met hox yme t hyl a1 244 Dimethoxymethylal 245 1,4-Dioxane 246 Trioxane 247 1,3,5-Trioxepane 248 Water
236 1-Ethoxy- 3- pent an01 237 4- Methoxy-4-methyl2 - pent an01 238 3-Methoxy-1-butyl acetate 23 9 2- Methoxylethyl acetate 240 2-Ethoxyethyl acetate 241 Acetonyl acetate
228 229 230 231 232 23 3 234
2-Methoxyethanol 2- Butoxyethanol 2- Alldoxyethanol 2-Methoxy- 1-propanol 2-Ethoxy- 1-propanol 3- Ethoxy- 1- propanol 1-Methoxy-2-propanol 235 1- Propoxy- 2-propanol
Compound
Ti@,
~~~
Table 111. Correlation Coefficients between the Columns for the Whole Library Column No.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1
2
3
4
5
6
1 0.99 0.99 0.97 0.93 0.87 0.75 0.56 0.91 0.32 0.82 0.87 0.62 0.80 0.71 0.70
1 0.99 0.98 0.93 0.88 0.77 0.59 0.91 0.34 0.83 0.87 0.64 0.82 0.73 0.71
1 0.99 0.95 0.91 0.80 0.63 0.94 0.39 0.87 0.90 0.69 0.85 0.77 0.75
1 0.97 0.96 0.88 0.73 0.98 0.50 0.91 0.93 0.78 0.91 0.85 0.84
1 0.96 0.89 0.77 0.97 0.53 0.93 0.94 0.79 0.92 0.88 0.84
1 0.97 0.88 1.0 0.69 0.95 0.94 0.91 0.98 0.96 0.94
7
8
1 0.95 0.96 0.81 0.90 0.87 0.97 0.99 0.98 0.98
University of Brussels. All programs were written in FORTRAN.
RESULTS AND DISCUSSION The amounts of information obtained from one column as calculated with Equation 1 are about 6.5 bits, slightly depending on the stationary phase and the set of compounds. These surprisingly small differences might be ascribed to the fact that the columns considered have proved to be useful in chromatographic practice. The differences amount to about 0.3 bit for a particular set of compounds. Generally speaking, the amounts of information per column obtained for the whole library are slightly larger than the information for the three subsets. However, column 10 (diglycerol) yields a much larger amount of information for the whole library. This might be ascribed to a unique separation mechanism for this column. Then one may expect that there exists only a relatively small correlation between column 10 and the other columns. This is confirmed by the values of the correlation coefficients in Tables 111-VI. It also explains why column 10 appears as a separate class in three of the data sets when these are divided into 3 groups (Tables VII-X). The special characteristics of diglycerol are also reflected by the fact that it is one of the 16 out of 226 phases characterized by McReynolds ( 1 ) which are found to be “abnormal” (9). Numerical taxonomy on McReynolds’ set of phases also yields a separate class for diglycerol (IO). The correlation coefficients differ from set to set. Generally speaking, the correlations for the alcohols and esters are high, for the aldehydedketones rather low. In Tables
1 0.84 0.86 0.83 0.77 0.95 0.92 0.95 0.93
9
1 0.65 0.94 0.94 0.89 0.97 0.93 0.92
10
1 0.61 0.53 0.89 0.79 0.80 0.85
11
12
13
14
IS
1 0.98 0.83 0.92 0.94 0.85
1 0.78 0.89 0.91 0.81
1 0.95 0.95 0.99
1 0.97 0.97
1 0.96
16
1
VII-X, the results of the selection of columns by applying information theory and the classification by numerical taxonomy are given. From these results, it appears that the choice of the optimal set of columns and the amount of information strongly depends on the classes of chemical compounds which compose the library. This conclusion confirms the supposition of Weiner and Parcher (8) regarding the choice of stationary phases for each group of solutes. Furthermore, it can be observed that a selected combination of columns never contains more than one apolar column. Undoubtedly this can be ascribed to the coverage of more factors governing the retention behavior when only one apolar column is contained in the set. This is in agreement with the high values of the correlation coefficients for apolar with other apolar columns, whereas the correlations between apolar and polar columns are relatively low. If the columns are ranged in desdending order of correlation with squalane, a sequence is found almost identical to that given by McReynolds ( 1 ) . The choice of only one apolar column could also be expected from the general classification given by Massart et al. (10) where apolar liquid phases such as silicones and hydrocarbons are found in one separate class. Since only one phase should be selected from a class, only one apolar phase should then be found in a set of selected phases. It is interesting to observe that the best column, with respect to the amount of information obtained, does not necessarily belong to the best combination of 2 columns. This can be ascribed to the correlation between the columns. The results obtained from the classification by taxonomy resemble closely those obtained from the sequencing of col-
ANALYTICAL CHEMISTRY, VOL. 47, NO. 13, NOVEMBER 1975
2171
~~
~~~
~~~
Table IV. Correlation Coefficients between the Columns for the Subset Alcohols Column
No.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1
2
1 1.0 1.0 0.99 0.98 0.95 0.86 0.78 0.96 0.39 0.98 0.99 0.85 0.90 0.91 0.90
1.0 0.99 0.98 0.95 0.87 0.79 0.97 0.41 0.98 0.99 0.85 0.90 0.91 0.90
3
4
5
1 1.0 0.99 0.97 0.90 0.82 0.98 0.46 0.99 1.0 0.88 0.93 0.94 0.93
1 0.99 0.98 0.92 0.85 0.99 0.49 1.0 0.99 0.90 0.94 0.95 0.95
1 0.98 0.94 0.88 0.99 0.55 0.99 0.99 0.92 0.96 0.96 0.96
s
7
6
9
10
11
12
13
14
15
16
1 0.99 0.92 0.95 0.96 0.96
1 0.89 0.93 0.94 0.94
1 0.99 0.99 0.99
1 1.0 0.99
1 1.0
1
14
15
16
1
1 0.98 0.93 1.0 0.65 0.99 0.97 0.97 0.99 0.99 0.99
1 0.99 0.96 0.78 0.93 0.90 0.99 1.0 0.99 0.99
1 0.91 0.86 0.87 0.83 0.98 0.97 0.97 0.96
1 0.61 0.99 0.98 0.96 0.98 0.98 0.98
1 0.54 0.48 0.81 0.74 0.74 0.74
~
~
~~~
Table V. Correlation Coefficients between the Columns for the Subset Aldehydes/Ketones Column
No,
3
4
5
6
7
8
9
10
11
12
13
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
~~
2
1
1
0.99 0.99 0.99 0.97 0.97 0.93 0.87 0.98 0.13 0.96 0.96 0.93 0.94 0.93 0.96
~
1 1.0 1.0 0.99 0.98 0.95 0.89 0.99 0.07 0.97 0.98 0.94 0.93 0.95 0.97
1 1.0 0.99 0.98 0.95 0.89 0.99 0.09 0.98 0.98 0.95 0.94 0.96 0.97
1 0.99 0.99 0.96 0.92 1.0 0.08 0.99 0.99 0.96 0.94 0.97 0.98
1 0.99 0.98 0.94
1 .o
0.11 0 -99 0.99 0.98 0.95 0.99 0.99
1 0.99 0.96 1.o 0.16 0.99 0.99 0.99 0.96 0.99 1.o
1 0.99 0.98 0.27 0.97 0.97 0.99 0.98 0.99 0.99
1 0.95 0.30 0.95 0.94 0.98 0.96 "0.98 0.97
1 0.15 0.99 0.99 0.98 0.96 0.99 0.99
1 0.10 0.08 0.26 0.41 0.19 0.21
1 1.0 0.98 0.94 0.99 0.99
1 0.98 0.93 0.98 0.99
1 0.97 1.0 1.0
1 0.96 0.97
1 0.99
1
12
13
14
15
16
1 1.0 1.0 0.99
1 1.0 0.99
1 0.99
~
Table VI. Correlation Coefficients between the Columns for the Subset Esters Column No.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2172
1
2
3
4
5
6
7
8
9
10
11
1 1.0 1.0 0.99 0.96 0.94 0.88 0.76 0.96 0.60 0.92 0.96 0.89 0.91 0.89 0.94
1 1.0 0.99 0.97 0.94 0.88 0.77 0.96 0.61 0.92 0.96 0.89 0.91 0.89 0.94
1 1.0 0.98 0.96 0.91 0.80 0.98 0.65 0.95 0.98 0.92 0.94 0.92 0.96
1 0.99 0.98 0.94 0.85 0.99 0.71 0.97 0.99 0.94 0.96 0.95 0.98
1 0.99 0.97 0.90 1.0 0.78 0.98 0.99 0.97 0.98 0.97 0.99
1 0.99 0.93 1.0 0.83 0.99 0.99 0.99 1.0 0.99 1.0
1 0.98 0.97 0.90 0.98 0.97 1.0 1.0 1.0 0.99
1 0.91 0.97 0.94 0.90 0.97 0.96 0.97 0.94
1 0.79 0.99 0.99 0.98 0.99 0.98 1.0
1 0.85 0.79 0.90 0.87 0.89 0.84
1 0.99 0.99 0.99 0.99 0.99
ANALYTICAL CHEMISTRY, VOL. 47, NO. 13, NOVEMBER 1975
1 0.97 0.98 0.98 0.99
1
Table VII. Combinations of Columns Which Yield a Maximum Amount of Information a n d the Classification of Columns for the Whole Library Total amount
No. of columns
1 2 3 4 5
Obtained combina-
of
tion of columns
information
13 10,12 1, 8,10 2, 8, 10,ll 2,8,10,11,13
6.8 13.5 19.2 24.4 29.1
Classification of columns with
Correlation coefficients
Taxonomic distances
1-6,9,11,12/7,8,10,13-16
1-6,9,11,12/7, 8,13-16/10 1-6,9/7,8,13-16/11,12/10 1-6,9/7,13-16/11,12/10/8
1-7,9,11,12,14/8,10,13,15,16 1-4/5-7,9,11,12,14/8,10,13,15,16 1-4/5-7,9,11,12,14/8,10/13,15,16 1-4/5-7,9,11,12,14/8/10/13,15,16
Table VIII. Combinations of Columns Which Yield a Maximum Amount of Information and the Classification of Columns, for the Subset Alcohols Total amount
No. of
Obtained combina-
of
Columns
t i m of columns
information
1 2 3 4 5
8 278 8,10,12 2, 8, 10,16 2,7,8,10,16
6.4 12 .o 15.8 18.9 21.3
Classification of columns with
Correlation coefficients
Taxonomic distances
10/1-9,ll-16 10/1-5,11,12/69,13-16
1-6,9,11,12/7,8,10,13-16 1-6,9,11,12/8,10,13/7,14-16 1-3/4-6,9,11,12/8,10,13/7,14-16 1-3/4-6,9,11,12/8,10,13/7,14,15/16
10/1-5,11,12/6,7,9,13-16/8 10/1-5,11,12/7,13-16/6,9/8
Table IX. Combination of Columns Which Yield a Maximum Amount of Information and the Classification of Columns for the Subset Aldehydes/Ketones Classification of columns with
Total amount No. of
Obtained combina-
of
Columns
tion of columns
information
Correlation coefficients
Taxonomic distances
1
8 10,12 1,8,10 1, 8,10,11 1, 8,10,11,12
6.4 12.9 17.9 21.3 24.0
10/1-9,ll-16 10/1-4/5-9,ll-16 10/1-4/14/5-9,11-13,15,16 10/8/14/1-4/5-7,9,11-13,15,16
1-4/5-16 1-4/5-7,9,11-16/8-10 1-4/5-7,9,11-16/8/10 1-4/5,6,9,14/7,11-13,15,16/8/10
2 3 4 5
Table X. Combination of Columns Which Yield a Maximum Amount of Information and the Classification of Columns for the Subset Esters Classification of columns with
Total amount
No. of Columns
1 2 3
4 5
Obtained combina-
of
t i m of columns
information
Correlation coefficients
6.4 12.1 16.0 18.9 21.1
8,10/1-7,9,11-16 1-4/8,10/5-7,9,11-16 1-4/5-7,9,11-16/8/10 1-4/5,6,9,11,12,16/7,13-15/8/10
12 1, 8 2,8,11 2, 8,10,11 2, 8,10,11,15
umns by using the amount of information. The agreement between the results found by both methods is not surprising as the information per column varies only slightly from column to column. Therefore, the correlation is also the determining factor in the information theoretical approach. In addition, numerical taxonomy was applied using taxonomic distances. In this context, it is interesting to observe the smaller agreement between the information theoretical approach and the numerical taxonomy based on these distances. I t is important to note that both procedures are leading to results strongly dependent on the nature of the class of chemical compounds. Therefore, the results presented in this paper are not generally valid, but apply only to the sets of compounds and columns taken into account.
Taxonomic distances
1-7,9,12,14,16/8,10,11,13,15 1-4/5-7,9,12,14,16/8,10,11,13,15 1-4/5-7,9,12,14,16/8/10,11,13,15 1-4/5,9/6,7,12,14,16/8/10,11,13,15
Considering the resemblance between the two procedures applied to the selection and classification of columns, it seems appropriate to put forward the question which procedure is to be preferred. As in every optimization procedure, the choice of the optimization criterion has to be considered. The choice of the amount of information as a criterion seems to be justified as it is closely related to the usefulness of the columns for retrieval purposes and is also an indication for the average quality of the separations that can be obtained. However, it can be argued that it should not be the sole criterion. The information theoretical approach is a selection procedure which yields an unambiguous choice of the columns. In this sense, this selection procedure can be considered as unique since all other tech-
ANALYTICAL CHEMISTRY, VOL. 47, NO. 13, NOVEMBER 1975 * 2173
niques such as factor analysis, pattern recognition, principal component analysis, nearest neighbor, and including numerical taxonomy are merely classification techniques. The relationship between numerical taxonomy and the other classification techniques has been discussed already in a study by Massart et al. (IO)in which a classification of 226 liquid phases was carried out. Among these techniques, numerical taxonomy and pattern cognition as described by Wold (11)are the only ones yielding formal classifications. Numerical taxonomy permits one to obtain a complete hierarchical classification which is not the case for pattern cognition. By the classification obtained with numerical taxonomy, the selection of columns is facilitated. This is considered to be one of the main advantages of numerical taxonomy, namely, that other factors such as stability of columns, availability, and price can be taken into account, which is not the case for the information theoretical approach. In the calculation of the amount of information, correlation coefficients have been used. Use of the same correlation coefficients as the classification parameter, in order to be able to make the optimal choice of columns, seems to be justified and yields better results than the Euclidean distance. Application of both procedures therefore necessitates (rather elaborate) calculations of correlation coefficients. The estimation of the amount of information for n columns requires an additional calculation of a series of-in our case-(',6) determinants, but the calculations can be reduced by using the selection procedure of Dupuis and Dijkstra (13). The application of numerical taxonomy requires a reduction of a 16 X 16 matrix. The mathematical background of the information theory procedure is more complex. The final conclusion of this article is twofold. As far as the methods are concerned, it is clear that the combined use of mathematical tools such as information theory, pattern-cognition, nearest neighbor calculations, principal components analysis, and numerical taxonomy permits the classification and combination of chromatographic techniques. They should therefore be of value in comparative physicochemical studies of these systems and in the selection of sets of preferred phases. As far as the results of the GLC problem which is treated here are concerned, this ap-
proach can be considered as intermediate between the two extreme positions of those who propose a restricted set of stationary phases for all GLC uses, and of those who do not want any such sets, stating that special separation problems do occur which cannot be solved with such a restricted set. It is the belief of the authors of the present article that both positions are somewhat exaggerated at the present stage of development of GLC. It cannot be denied that there is a large redundancy in GLC phases. At the same time, it is clear that it is not possible to achieve all GLC separations with a restricted set. When selecting phases for separation of restricted sets of, for example, alcohols, esters, steroids, trimethylsilylesters, etc., it should be possible to arrive at a reduced number of phases without excluding the possibility of some separations to be carried out further.
LITERATURE CITED (1) W. 0. McReynolds, J. Cbromatogr. Sci., 8, 685 (1970). (2) S. T. Preston, J. Cbromatogr. Sci., 8, (Dec). 18A (1970). (3) J. J. Leary, J. 6.Justice, S. Tsuge, S. R. Lowry, and T. L. Isenhour, J. Cbromafogr. Sci., 11, 201 (1973). (4) R. A. Keller, J. Cbromatogr. Sci., 11, 188 (1973). (5) J. R. Mann and S. T. Preston Jr., J. Chromafogr. Sci,, 11, 216 (1973). (6) R. S. Henly, J. Cbromatogr. Sci., 1 1 , 221 (1973). (7) L. Rohrschneider, J. Cbromafogr., 22, 6 (1966). (8) P. H. Weiner and J. F. Parcher, J. Cbromafogr. Sci., IO, 612 (1973). (9) S. Wold and K. Andersson. J. Cbromafogr., 80, 43 (1973). (10) D. L. Massart, M. Lauwereys, and P. Lenders, J. Chromafogr. Sci., 12, 617 (1974). (1 1) S. Wold, Technical Report No. 364, Department of Statistics, University of Wisconsin, Madison, Wis., 1974. (12) A. C. Moffat, A. H. Stead, and K. W. Smalldon, J. Cbromafogr.. 90, 19 (1974). (13) P. F. Dupuis and A. Dijkstra, Anal. Chem., 47, 379 (1975). (14) S. Hawkes, D. Grossman, A. Hartkopf, T. Isenhour, J. Leary. J. Parcher, S. Wold, and J. Yancey, J. Cbromatogr. Sci., 13, 115 (1975). (15) D. L. Massart and H. De Clercq, Anal. Chem., 46, 1988 (1974). (16) C. E. Shannon and W. Weaver, "The Mathematical Theory of Communication", The University of Illinois Press, Urbana, Ill., 1949. (17) P. H. A. Sneath and R. R . Sokal, "Numerical Taxonomy", W. H. Freeman, San Francisco, Calif.. 1973. (18) W. 0. McReynolds, "Gas Chromatographic Retention Data", Preston Technical Abstracts Company, Evanston, Ill., 1966. (19) T. D. Sterling and S. V. Polak, "Introduction to Statistical Data Processing", Prentice-HallInc., Englewood Cliffs, N.J., 1968.
RECEIVEDfor review March 27, 1975. Accepted July 10, 1975. The Belgian authors thank FKFO and FGWO for financial assistance.
Computed Alpha Coefficients for Electron Microprobe Analysis D. Laguitton, R. Rousseau, and F. Claisse Department of Mining and Metallurgy, Universite Laval, Quebec, Canada
Computed a coefficients for the application of empirical equations in microprobe analysis are determined by the method developed by Rousseau and Claisse for X-ray fluorescence analysis. It is shown that the Claisse-Quintln relation with such coefflclents yields results of analysis Identical with those obtained by the ZAF method. Nearly as accurate results are obtained when the second-order terms relative to three elements are neglected.
The influence coefficient method ( a coefficients) can be either empirical as in the Ziebold-Ogilvie ( 1 ) method, semi-empirical as in the Lachance-Trail1 (2) and Claisse2174
Quintin ( 3 ) methods or theoretical as shown in the recent publication of Rousseau and Claisse (4). The empirical method of Ziebold and Ogilvie is based on the observation that curves of C A / K Aas a function of CA are nearly linear: C A K A= QAB
+ (1 - QABICA
(1)
where CA is the weight concentration of element A and K A is the ratio of the measured X-ray intensity of A in a binary solid solution of elements A and B and the measured intensity of A in a reference specimen of pure element A. The semi-empirical method of Lachance and Trail1 was first developed for X-ray fluorescence analysis but was ap-
ANALYTICAL CHEMISTRY, VOL. 47, NO. 13, NOVEMBER 1975