Anal. Chem. 1989, 6 1 , 367-375
367
Characterization and Prediction of Retention Behavior in Reversed-Phase Chromatography Using Factor Analytical Modeling C. H. Lochmuller* a n d S. J. Breiner
Department of Chemistry, Duke University, Durham, North Carolina 27706 Charles
E.Reese
Birkbeck College, University of London, London, England W C l H OAJ
M.N. Koel Institute of Chemistry, Estonian Academy of Science, Tallinn, Estonia, USSR 200026
The multlvarlate statistical technlques of principal component (or factor) analysls and target transformatlon factor analysis have been used to examine the reversed-phase hlgh-performance llquld chromatography behavlor of some 35 benzene derivatives In the solvent systems water/methanol/acetonltrlle and water/methanol/tetrahydrofuran. The factors extracted durlng these analyses are llnked wlth chemical effects related to the Influences of both solvent and solute on retention behavlor. Also presented Is a strategy whereby the retention behavlor of a wide range of solutes In diverse water/methanol/acetonltrlle solvents Is predlcted (4 % root mean square error over 0.5 < k' < 20) using a small (3 X 4) matrlx of retention values In blnary solvents as a model tralnlng set, along wlth three retention values for each compound of Interest. The rellablllty of the predlctlon method Is discussed and the potentlal of the method Is demonstrated via comparlsons of simulated resolutlon maps based on both predlcted and experlmental data.
INTRODUCTION The need for a reliable optimization/prediction strategy for reversed-phase high-performance liquid chromatography (RP-HPLC) has grown greatly in this decade in large part due to the advent of reliable automated instrumentation. The ease with which large amounts of chromatographic data can be collected has led to attempts at prediction of RP-HPLC using resolution mapping (I),gradient preoptimization (2-4), and linear regression modeling. Numerous solute-related descriptors have been used to predict retention, including such parameters as hydrophobic fragmental constants, octanolwater partition coefficients, molecular connectivity indices, and other molecular parameters (5-9). A critical review of many of these efforts has been published by Kaliszan (10). Solvent effects have also been the subjects of numerous studies with some success in the prediction of retention behavior ( 11-14). Unfortunately, the complexity of solute/solvent/ stationary phase interactions, coupled with the growth in the use of ternary and quaternary solvent systems, has either limited the success and the generality of the various attempts or else required large numbers of experiments for reliable predictions. We believe the major obstacles to robust predictive strategies are as follows: (a) The "true" variables that exactly define retention are unknown. In standard regression techniques, the model must specify a complete set of independent (and in this case, un0003-2700/89/0361-0367$01.50/0
known) variables, else prediction will be poor. (b) Development of classical regression models requires a wide range of retention behavior for each solute to be predicted, so numerous experiments for each chromatographed solute are required for statistical validity. In an attempt to circumvent these, we have developed an approach toward prediction based on principal component analysis (PCA) and target transformation factor analysis ("FA). These multivariate statistical techniques are found to be suited particularly to this problem yielding categorical information about solutes, evidence of chemical relevance for extracted information and rather good prediction of retention behavior using only a small training subset of the original data matrix. The mathematical operations in PCA can be performed by using any of a number of well-established eigenvalue-eigenvector extraction algorithms. A detailed and well-documented example of one type of decomposition calculation has been published by Malinowski and Howery (15). Using PCA, one considers each row in the data matrix to be a point in a multidimensional space with coordinates d e f i e d by the values corresponding to the appropriate n columns in the data matrix. (Conversely,the columns may be represented as points with m coordinates each.) In the current study, we treat each solute as a point in a space defined by its retention coordinates (actually the Naperian logarithm of the capacity factor) along 32 solvent composition axes. Perhaps the major strength of principal component decomposition is that the technique extracts, from the data themselves, axes (or eigenvectors) that best span the data matrix. The first eigenvector is computed such that the sum of the magnitudes of the projections of all points on that vector is a maximum; in other words, as much variation in the data as possible lies along the direction of the f i t eigenvector. The projection of each data point on the eigenvector will be the coordinate of that datum along the vector. The second eigenvector is chosen, orthogonal to the first, so that as much of the remaining variation lies along this vector. Subsequent vectors and the projections of data thereon are constructed in like manner until all the variation in the data can be described in terms of the extracted eigenvectors and associated coordinates along these vectors. The data matrix is thus decomposed into two matrices, the row cofactor matrix and the column cofactor matrix, which are composed of the coordinates and eigenvectors, respectively. When real data are examined this way the number of eigenvectors produced typically equals either the number of rows or columns in the original matrix, whichever is smaller. The utility of PCA comes from the possibility that only a small number of these 0 1989 American Chemical Society
388
ANALYTICAL CHEMISTRY, VOL. 61, NO. 4, FEBRUARY 15,
1989
Table I. Solutes and Solvents Used in the Study Solvents Used in the PCA water/methanol/acetonitrile (%
water 1methanol/ tetrahydrofuran H,O/% CH30H/% THF, v/v/v)
H,O/% CH,OH/% CHBCN, v/v/v)
(%
40/30/30 20180/ 0 40145115 10/0/90 40/60/0 10145145 60/30/10 30/0/70 O/O/lOO O/O/lOO 60/40/0 0125175 30117.5152.5 0/50/50 50/0/50 0/50/50 30135135 0175125 50112.5137.5 20/0/80 0/75/25 0/100/0 20/20/60 50/25/25 0/100/0 40/0/60 20/40/40 40115145 20/60/20 Solvents Used as Basis Targets in Prediction Study (% HzO/% CH30H/% CH,CN, v/v/v) 70/30/0 40/60/0 10/90/0 0/50/50 50/0/50 2o/o/so 60/40/0 30/70/0 O/loo/O 70/0/30 40/0/60 10/0/90 20/80/0 0175125 60/0/40 50/50/0 30/0/70 O/O/lOO
60/0/40 60/30/ 10 50/0/50 50/12.5/37.5 50125125 50/37.5/12.5 50/50/0 40115/45 40/30/30
401451 15 40/60/0 30/17.5/52.5 30135135 30152.5117.5 30/70/0 20/40/40 20/60/20 20/80/0
10/0/90 10145145 10167.5122.5 10/90/0
60/0/40 60/10/30 60/20/20
Solutes Used Throughout the Study (all solutes used in both solvent systems) acetophenone diethyl phthalate methyl benzoate anisole dimethyl phthalate naphthalene benzaldehyde m-dinitrobenzene p-nitroacetophenone benzene p-dinitrobenzene p-nitrobenzaldehyde benzonitrile 2,4-dinitrotoluene nitrobenzene benzophenone 2,6-dinitrotoluene sec-phenethyl alcohol biphenyl 3,4-dinitrotoluene phenyl ether n-butylbenzene ethylbenzene 2-phenylethyl alcohol p-chlorobenzaldehyde m-fluoronitrobenzene 3-phenyl-1-propanol chlorobenzene o-fluoronitrobenzene n-propylbenzene p-chlorotoluene p-fluoronitrobenzene toluene o-dichlorobenzene p-methoxybenzaldehyde eigenvectors represent the "true" data and that additional vectors represent noise alone. After performing the decomposition of the matrix, one is left with the task of deciding how many factors are significant and how many to discard as error. If the number of factors designated as significant is smaller than the number produced by the decomposition, the inherent rank (i.e. dimensionality) of the calculated data set (i.e. the approximation of the original data using the limited number of factors) is effectively reduced relative to the rank of the original data matrix. In this way, one hopes to eliminate a portion of the noise associated with the data and, simultaneously, reduce the number of variables required for the description of points (i.e. row vectors) in the data matrix. The ease of apportionment of an extracted factor as either noise or signal depends on experimental signal to noise ratio and the presence or absence of bias and on how adequately the data behavior is spanned by the measurement matrix. A definitive approach €or specifying the "correct'! number of signal factors has yet to be developed, though criteria for estimating the proper number and significance of factors have been the subjects of several publications (15-19). Recent papers have examined the use of TTFA- and PCA-related methods for property estimation and classification in chromatography (20). Such desirable results as the prediction of physical properties from gas-liquid chromatography (GLC) retention indices (21) and reversed-phase liquid chromatography (RPLC) behavior (22)and the classification of reversed-phase sorbent materials themselves (23) have been demonstrated. Target Transformation Analysis. The exploratory phase of the analysis concentrated on the cofactor matrices, in order to acquire information about the abstract mathematical structure associated with the rows or columns of the data matrix. To attach chemical significance to the abstract factors, one may presume that a column (or row) of real, measurable or calculable parameters (called a target vector) can be cast
as a real factor and then test whether the hypothesis holds. This technique, known as target transformation factor analysis, uses a least-squares criterion for finding a transformation vector that best constructs the real target vector as a linear combination of abstract cofactors (15,24). When the transformation vector is able to closely reproduce the target vector from the abstract cofactors, the target is presumed to be a real factor. If the components of the target vector and the vector constructed via transformation closely agree, the parameter in question can be reasonably treated as an underlying variable in explaining the behavior encoded in a data matrix. EXPERIMENTAL SECTION Chromatographic Measurements. All retention data were collected with a commercial liquid chromatograph (IBM LC-9533). A set of six columns was prepared on one day from the same lot of Whatman ODs-3 (Whatman, Clifton, NJ), a (2-18 material. Mobile phases were prepared from commercial, HPLC grade solvents. Mixing proportions were determined by mass to eliminate the possibility for errors associated with the mechanical mixing capability of the pump. Void volumes were determined both by DzO and by regression of a series of alkyl benzenes. The data used for the PCA were DzO corrected. Retention measurements collected 18 months apart and on different columns from the original set agreed within -2-3%. Listings of the solutes and solvents used appear in Table I. Data Manipulation. Computer manipulations were performed initially by using SAS statistical routines (SAS Institute, Raleigh NC) on the Triangle Universities Computational Center's IBM system 3081. Later the appropriate subroutines contained in FACTANAL (25)and ARTHUR (Infometrix,Inc., Seattle WA) were used in conjunction with our VAX 111730 (Digital Equipment Corp., Maynard, MA). Plotting, printing, and data manipulations were done using Macintosh (Apple Computer, Inc., Cupertino, CA) and MS-DOS computers with a variety of commercially available software packages including the applications, Macspin (D2 Software, Inc., Austin TX), SURFER (Golden
ANALYTICAL CHEMISTRY, VOL. 61, NO. 4, FEBRUARY 15, 1989
369
Table 11. Results of Principal Component Analyses Water/Methanol/Acetonitrile System % cumulative
variance
factors
eigenvalue
explained
reduced eigenvalue“
indicator* function ( x io4)
Wold’sc std dev
1 2 3 4 5 6 7 8 9 10
1533.15 171.62 3.66 0.77 0.24 0.13 0.09 0.03 0.03 0.01
89.67 99.71 99.92 99.97 99.98 99.99 99.99 100.00 100.00 100.00
1.6850 0.2019 0.0046 0.0011 0.0004 0.0002 0.0002 0.0001 0.0001 0.0000
7.189 1.341 0.774 0.565 0.482 0.425 0.352 0.334 0.308 0.312
0.4744 0.0850 0.0483 0.0456 0.0426 0.0404 0.0380 0.0391 0.0382 0.0386
reduced eigenvalue
indicator function (xi04)
Wold’s std dev
1.7500 0.2607 0.0095 0.0024 0.0009 0.0003 0.0001 0.0001 0.0001 0.0001
8.237 1.922 1.111 0.738 0.520 0.431 0.397 0.389 0.376 0.363
0.5425 0.1205 0.0690 0.0431 0.0309 0.0236 0.0207 0.0210 0.0193 0.0175
no. of
Water/Methanol/Tetrahydrofuran System %I
no. of
factors
eigenvalue
1 2 3 4 5 6 7 8 9 10
1592.52 221.62 7.50 1.80 0.59 0.18 0.07 0.04 0.03 0.02
cumulative variance explained 87.29 99.44 99.85 99.95 99.98 99.99 99.99 99.99 100.00 100.00
Reference 18. *Reference 15. Reference 17. Software, Golden, CO), and SYMPHONY (Lotus Development, Cambridge, MA).
RESULTS Principal Component Analysis. We subjected logarithmic retention data to basic PCA and found that for water/methanol/acetonitrile solvents, the two largest factors account for 99.7% of the dispersion in the behavior of 35 substituted benzenes in 28 solvent compositions. PCAs run on these data included only complete data columns so that no estimation of missing values might bias the results of our analyses. Excessively large capacity factors (k’ > lo3) precluded acquiring complete vectors of solute data for solvents incorporating more than 70% water (v/v) and these solution compositions were not included in the exploratory analyses. In water/methanol/tetrahydrofuran,the largest two factors explain some 99.5% of the variation and addition of a third component raises this to 99.9%. Summaries of the results of the abstract factor analyses are given in Table 11. In their initial forms, the eigenvectors are abstract mathematical constructions and have no straightforward chemical or physical meaning. Despite this, we have found the first few row cofactors useful in categorizing the solutes based on their chromatographic behavior. Figure 1 shows two views of cofactors for the solutes in Table I on axes described by the first three eigenvectors extracted from the water/ methanol/acetonitrile chromatographic data. Similarly, Figure 2 shows the locations of the various solute cofactors in the space of the first three principal eigenvectors for the water/methanol/tetrahydrofuran solvent system. The most striking feature that appears in these cofactor maps is, in both cases, the maintenance of a cluster pattern following the socalled “Martin Rule” (26) by the cofactors associated with benzene and the C-1 through C-4 n-alkylbenzenes. The clear linearity of factors 1, 2, and 3 with the number of carbons is suggestive of a linear free energy relationship as has been used
in several studies in attempts to model RPLC and the importance of molar volume in dispersion force interactions (26, 27). The fact that this linearity is preserved over the threedimensional factor space is interesting. The implication is that all of the interactions (or a t least the three significant producers of data variation) participating in retention of the n-alkylbenzenes follow a relatively strict adherence to such a linear relation. Additionally, there seem to be physicochemical bases associated with the magnitudes of the first extracted cofactors. The evidence for the association and the implications thereof will be discussed further in the discussion section of this paper. Prediction of Retention Behavior by TTFA. We have developed a method for predicting behavior in RP-HPLC which we have applied to our data in water/methanol/acetonitrile solvents. In our search for a means to predict retention behavior, several criteria were deemed important. Among these were the following: (a) The method must give reasonable predictions over a wide range of solutes and solvent compositions. (b) A minimal number of experiments should be required. (c) Extension to new solvent compositions and solutes should be relatively simple within a given chromatographic system. (d) The method should be transportable to other chromatographic systems (i.e. different columns and solvent systems). Among these, the first three have been accomplished and the last is currently being investigated. In order to determine the minimum number to factors necessary to generate a reasonable predictive model, we examined the eigenvalues, cumulative variance, reduced eigenvalues (18)and complete cross-validation (17) associated with the eigenvectors of the data set. Unfortunately, the determination of the number of factors was not considered definitive on the basis of these metrics. The results of these calculations seem to indicate a factor number of between 5
370
ANALYTICAL CHEMISTRY, VOL. 61, NO. 4, FEBRUARY 15, 1989 5 1
m
. 4
'
. 3 1 . 5
0
*
1
2
~
8
4
1
' 10
8
1
'
1 14
12
'
~
FIRST SOLVENT COFACTOR
FIRST SOLVENT COFACTOR
Mapping of solutes in the planes of the (A) first and second and (B) the first and third principal eigenvectors of the water/ methanoVacetonitriIe solvent systems. Solid lines show the linearity of the alkyl benzenes. Point numbering corresponds to the listing in Table
Flgure 1.
1
8 1
I
1 0
0.5
I"
11 0
i
18
1."
13 0
24 290 0
22 0
00 0
3 0
28
1
0 5-
0
0
0
14
e 27
0
7
25
- 8
I 4
, 5
1 8
'
1 7
'
1 8
'
1 I
1.5
' 4
, I
1 8
'
1 7
'
1 E
,
i 9
FIRST SOLVENT COFACTOR FIRST SOLVENT COFACTOR Flgure 2. Mapping of solutes in the planes of the (A) first and second and (B) the first and third principal eigenvectors of the water/ methanol/tetrahydrofuran solvent systems. SolM line shows the linearity of the alkyl benzenes. Point numbering corresponds to the listing in Table 111.
and 9 with no clear agreement. In view of these results, we used short-circuit reproductions (15) of the data-based on two, three, and four factors and found that a three-factor reproduction of the data led to an root mean square error of approximately 0.035 in k ' units, near the replicate precision of our data. We concluded that a model incorporating only three factors should reproduce the data reasonably well while keeping the model relatively simple. In this initial work, we decided to use only primary and binary solvents as targets, even for the prediction of behavior in ternary solvents. We believe these restrictions to represent a "worst case" scenario
of the problem and would expect that using targets employing ternary solvent information or more numerous factors might significantly improve our predictions. Target testing and combination (see ref 15) were performed on 18 primary and binary solvent data columns (see Table I) to find that combination of three which best span the In k'data space. A check of all possible combinations of three generated 73 combinations whose root mean square errors in reproduction were less than 0.05 In k'units. The set with the smallest error comprised 60%:40%:00%, 60%:00%:40%, and 00%:75%:25% v/v/v water, methanol, and acetonitrile; we
ANALYTICAL CHEMISTRY, VOL. 61, NO. 4, FEBRUARY 15, 1989
chose to use these as our basis solvents. The data matrix was then transposed, and target tests were performed by using solute data vectors as targets to find the best set of representative solutes available from the data set. A check of all combinations of three solutes in the data set (excepting n-butylbenzene, which was excluded due to missing values in its data vector) revealed o-dichlorobenzene, p-dinitrobenzene, and p-methoxybenzaldehyde as the best basis set, in terms of the solutes. The set chosen for further processing was that set with the smallest root mean square error out of 29 combinations whose root mean square errors were less than 0.04 In k’ units. On the basis of the results of the combination testing, it seems as though the success of subsequent prediction should not be particularly sensitive to the specific choices made regarding basis solutes and solvents. The fact that a substantial number of combinations produce errors of approximately the same small magnitude support this view and might well be expected if, indeed, the dominant chromatographic effects are the results of only a few factors. One would expect to see those factors at work in any representative combination of solutes and/or solvents. The prediction strategy was begun by starting with the In k’values for the three training solutes (rows) chromatographed in each of the three training solvents (columns). This core matrix thus required data from niae chromatograms to act as a basis set from which other solute- and solvent-related behavior might be predicted. In practice, a fourth solute, 2-phenylethyl alcohol, was added to the core matrix to prevent problems with the near-singularity of the training matrix, thereby increasing the size of the core matrix to 4 X 3. Next we constructed a column vector for each of the remaining solvents, which consisted of the retention parameters (In k )’ for each of the core solutes acquired in that solvent. These vectors were concatenated to the core matrix, resulting in an augmented matrix with four solute rows and with 3 + n solvent columns (3 core and n remaining solvents). The actual prediction of the retention behavior involved using the three core vectors (one for each key solvent, containing In k’values for four core solutes) as a basis set in terms of which In k’values for training solutes in other mobile phases were expressed. The augmented matrix was subjected to TTFA, using as targets the three core solvent vectors. The ultimate result of the target transformation was the construction of a three-component loading vector for each solvent, each component of which described the fractional contribution from one of the core solvent vectors toward the value in the new solvent. Further, the TTFA model posits that loadings for a solvent are characteristic of the solvent itself and independent of solute effects; hence the predicted value for any of the solutes in a given (modeled) solvent is simply the sum of three products 3
predicted in k ’ i n solvent x = C(1oading of solvent x i=l
on core solvent
i)
X (In
k’of solute in core solvent i)
where the loadings are taken from the loading vector corresponding to the solvent of interest and the In k ’ values are those measured for the solute in the core solvents. Figure 3 presents the prediction system in a tableau format. Once the core matrix is available, only three capacity factors are needed in order to model either a new solvent or solute. Generating the loadings for a new solvent is easily accomplished, the only new information needed being the In k’values for the key solutes in the new solvent, all other necessary information is held by the core matrix. A target transformation of the core matrix, augmented by the new In k’values, yields a loading vector for the new solvent that may be used
S
S
0
0
S 0
L
L
L
V
V
E N
E N T
V E
T
N
371 S 0
...
T
L V
E N T
-1
Loadings of modelled s o l v e n t s onto key s o l v e n t vectors...
PREVIOUSLY CALCULATED SOLVENT LOADING MATRIX ( 3 x M)
KNOW
N E W SOLUTB I N E W SOLUTE 2 N E W SOLUTE 3
MATRIX OF Ink’ PREDICTED VALUES I N “CORE” OF Ink’ S0LvF.NTS (NXM) (Nx31
N E W SOLUTE N
Flgure 3. Schematic layout of the components of the prediction strategy.
2
4
6
8
IO
12
FIRST SOLUTE COFACTOR
14
FIRST SOLUTE COFACTOR
Flgure 4. Plots of log ( P ) values vs the first solute cofactors in the water/methanol/aceton%rileand the water/methanol/tetrahydrofuran solvent systems. For the water/methanol/aceton%rilesystem (A), the solid line indicates the least-squares regression fit for all points. For the water/methanol/tetrahydrofuran system (B), the solid line indicates the regression fit when nitro compounds (open circles) are excluded.
in subsequent predictions. With the new loadings, the predicted value for any solute (for which In k ’values in the core solvents are known) is simply calculated by using the aforementioned summation. A new solute can be easily added to the prediction strategy by chromatographing in the three key solvents. Subsequent prediction of behavior in other solvents (for which loadings are known) is, again, the simple application of the summation of products using loadings for the previously modeled solvents. The size of the prediction matrix expands as follows: Given a set of m solvent loading vectors and a set of n solutes for which capacity factors in core solvents are known, the prediction matrix will be of dimension m x n. Addition of a new solute will add predictions for every modeled solvent and, conversely, a newly modeled solvent predicts behavior for each solute.
DISCUSSION Principal Component Analysis. RPLC retention is generally thought to be dominated by the hydrophobic (or “solvophobic”)effect and solutestationary phase interactions, the former being the most effectively manipulated variable. (As a general rule, adding water increases retention.) The first solute cofactors (in either solvent system), extracted via the exploratory PCA, correlate quite well with known hydrophobicities (log (P)in octanol/water), as can be seen in Figure 4A. The correlation is initially much better for the system water/methanol/acetonitrile (R2= 0.93) than is the case for the analogous tetrahydrofuran system (R2= 0.72). The poorer correlation in the THF-containing system is likely the result
372
ANALYTICAL CHEMISTRY, VOL.
-.-
0.4
0.3
a U
fj
2
t;
E
.O
-
-
0.2
B
0.1
0.1
-
-0.1
-0.3-0.4 -0.2
0.0-0.1
-0.2
-
-0.0-
-
0.2
5 2
I
-
0.3
ec
8
61,NO. 4, FEBRUARY 15, 1989
-
‘
I
I
I
I
1
I
I
-0.5
’
I
I
I
I
I
I
I
0 10 20 30 4 0 50 60 30 4 0 5 0 60 PERCENT WATER (VN) PERCENT WATER (Vnr) Figure 5. Plots of the first solvent cofactors vs water content in (A) water/methanol/acetonitrile and (B) water/methanol/tetrahydrofuransolvents. 1:l;(0)3:l;(A)0:l. Families Different points represent the ratios of methanol to either (A) acetonitrile or (B) tetrahydrofuran: (0)1:O; (A)1:3;(0) 0
10
20
of like ratio are joined. of specific interactions of T H F with any nitro group(s) in the compound. When mononitro and dinitro compounds are removed from the regression set (see Figure 4B),the correlation improves markedly (R2= 0.97). In the context of solvent cofactors, a pattern emerges that qualitatively corresponds to the solute cofactor pattern. As would be expected, the magitudes of the solvent cofactor vary monotonically with increasing water conent, which an invocation of solvophobicity theory requires. It is apparent when viewing parts A and B of Figure 5 that the values of solvent cofactors are dependent on more than simple water content. One observes families of curves whose parameters depend in part on the relative amounts of methanol and the second modifier (either acetonitrile or tetrahydrofuran). As was the case for the solute cofactors, the THF-containing system shows a much wider range of behavior than the CH3CN-containing system. If one accepts the recent work of Scott et al. (27),then there may be a reasonable chemical basis for this. According to their work there is evidence that methanol and tetrahydrofuran are reasonably good “complexing agents” for water while acetonitrile is far less so. This can have dramatic effects on the influence of solvent composition on the retention behavior of solutes. There are, for example, indications (14) that replacing methanol with acetonitrile, volume-for-volume, actually increases the amount of free water present in ternary mixtures. This in turn can lead to an increase in retention with the addition of the nominally stronger solvent actonitrile (contrary to expectations). Acetonitrile-containing, water/methanol systems might be expected to show better correlation to general hydrophobicity. The experience of many is that water/methanol/tetrahydrofuran mixtures give the better (or a t least different) selectivity. Prediction of Retention. In view of the aforementioned work, it should be apparent that prediction of retention based on simple solvent-related variables will encounter difficulty when association effects are present. Given the complexity of the intrasolvent interactions and their effects on retention, we have used a modeling approach which requires no prior knowledge about the specific variables (i.e. properties or phenomena) on which retention is based. The only assumption we make regarding the underlying variables is that the effects of those variables can be adequately reflected in the chromatography of a small set (the core matrix) of retention
values. In this way, we avoid the pitfalls of uncharacterized or non-orthogonal variables while satisfactorily accounting for the effects that govern retention. This predictive strategy has produced results that indicate that a factor-based modeling approach greatly simplifies prediction while retaining a predictive accuracy which rivals the best efforts of most other general predictive strategies. The success of the predictions is graphically represented in Figure 6. The agreement between the set of predicted k ’ values (calculated from predicted In k’values) and the set of experimental data is rather good (root mean square error of 6.1% of k’) over capacity factors from about 0.3 to 400, in solvents ranging from 0 to 80% water and 0 to 100% acetonitrile and methanol. When the comparison is restricted to a useful range of 0.5 < k’ < 20, the agreement improves somewhat (to a root mean square error of 4.9% of k’). More detailed information is presented in Table 111, including the root mean square and worst errors for each solute. A general pattern that emerged while using the TTFA approach is that predictions are generally very good, but begin to deteriorate as capacity factors increase in magnitude. This trend is not entirely unexpected and we believe it to be the result of the interplay between two agents: (1)the inherent increase in relative error associated with the measurement of retention time as k’ becomes large and (2) the exponential compounding of small errors in the core matrix as larger values of k ‘ are predicted. Although comparisons of predicted with experimental capacity factors are instructive, the true utility of this approach only becomes apparent when the results of predictions are taken in context with other predicted values. T o test the utility of the predictive scheme in RPLC, we have generated critical resolution maps (critical resolution as a function of solvent composition) for randomly chosen sets of solutes (excluding those four which were part of the core matrix) presuming unit volume chromatographic peaks. In each case the horizontal axis represents the water content (% v/v) and the vertical axis, the methanol content. The amount of acetonitrile in the solvent is readily calculated as the difference between 100% and the sum of the other two components. Resolution surfaces were generated for simulated mixtures of four and six components, respectively. In each case, surfaces were generated for both the predicted and experimental resolutions and an error surface (Rexperimena - Rpredic*d) was
ANALYTICAL CHEMISTRY, VOL. 61, NO. 4, FEBRUARY 15, 1989
373
24
loo0
A
.
1
2o
4
*
16
5U 10:
U
12
0 0 W
+
0
8
0
1.:
W
K
n
4
0 4
0
8
12
16
20
I
EXPERIMENTAL CAPACITY FACTOR < k’ < 400 on a logarithmic scale ( R 2
EXPERIMENTAL CAPACITY FACTOR
Figure 6. Plots of predicted capacity factors vs experimental capacity factors over the ranges (A) 0.2 = 0.990, n = 1068) and (e) 0.5 < k’ < 20 on a linear scale ( R 2 = 0.993, n = 757).
Table 111. R M S Errors in Predicted Capacity Factors in H20/CH30H/CH3CNover the Range 0.5 < k’< 20 associated no.
solute
9’0RMS error
% worst re1 error for solute
solvent
k’
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
acetophenone anisole benzaldehyde benzene benzonitrile benzophenone biphenyl n-butylbenzene” p-chlorobenzaldehyde chlorobenzene p-chlorotoluene o-dichlorobenzeneb diethyl phthalate dimethyl phthalate m-dinitrobenzene p-dinitrobenzene* 2,4-dinitrotoluene 2,6-dinitrotoluene 3,4-dinitrotoluene phenyl ether ethylbenzene m-fluoronitrobenzene o-fluoronitrobenzene p-fluoronitrobenzene p-methoxybenzaldehydeb methyl benzoate naphthalene p-nitroacetophenone p-nitrobenzaldehyde nitrobenzene sec-phenethyl alcohol 2-phenylethyl alcoholb 3-phenyl-1-propanol n-propylbenzene toluene
5.6 3.9 5.4 6.3 7.1 5.2 4.2
-11.5 10.7 9.3 23.0 11.5 14.6 -9.7
70f 30f0 70f 22.5 f 7.5 60f 30f 10 70f 22.5f 7.5 40f 45f 15 40 160f 0 10f 90f0
14.60 18.86 4.40 15.48 1.38 7.46 1.08
3.5 2.1 2.6
7.1 -3.8 -6.1
10190f0 50150f0 30f 70f 0
0.54 12.20 4.30
7.6 9.0 3.4
20.0 19.8 -6.8
40160f0 40160f0 40160f0
4.61 1.88 1.96
2.8 2.9 7.5 3.2 4.3 3.6 4.8 2.8
-5.8 5.7 16.7 7.0 -10.6 11.1 8.5 8.7
3.8 1.9 3.7 6.7 3.7 5.3
6.3 3.4 -7.4 11.1 8.2 15.3
4.5 6.4 4.1
-7.5 -12.2 -10.1
40f 45f 15 20f0180 30170f0 40f0160 30f 70f0 70f 22.5 f 7.5 20160f 20 70f 22.5 f 7.5
2.73 0.79 0.90 5.20 4.24 16.34 0.55 13.87
20160f 20 10190f0 70f 22.5 f 7.5 50f 37.5 f 12.5 70f 22.5f 7.5 80f 20f0
0.62 0.87 12.66 2.42 12.66 19.84
60f 30f 10 30170f0 30f 70f0
7.92 6.91 2.80
“Training k’were not available for this solute. bThis solute was in the training set and was,therefore, not included in the evaluations. generated for comparison purposes. Figures 7 and 8 show graphically the results of these calculations in the form of contour plots for each surface. Although predictions were made for a wider range of solvents than that for which complete experimental data were available, we have displayed only those areas within which direct comparisons could be made.
The results shown for the four-component case are t h e critical resolution and error surfaces generated for a simulated mixture of benzophenone, benzaldehyde, acetophenone, and naphthalene. Visual comparison of t h e experimental (Figure 7a) and predicted surface (Figure 7b) shows t h e excellent agreement i n terms of the general shape of t h e surface. Al-
ANALYTICAL CHEMISTRY, VOL. 61, NO. 4, FEBRUARY 15, 1989
l o90 o
\
80 70
-
60
-
50
-
40
7
30 20 10 I
I
0;
lb
2b 3b 4b
50
60
Ib
0;
$0
20
4b
30
6b
o;
-
l o t 01 0
10
!
I
20
30
/
40
,
50
60
70
PERCENT WATER ( V N ) Flgure 7. Contour maps for a simulated four-component mixture showing (A) resolution based on experimental capacity factors, (6)resolution based on predicted capacity factors, and (C) A - B. 100
n
30
i-
n
? 5
dZ U
r I-
80 70
:I
i 00
I 90
c
1
W 0 K W
40
I 30
n
*10 O
' \ ,
80
-
70
-
60
-
\
B
80
i1
50
#i IZ
1
1
t
40
-
30
-
20
-
10
-
- 0
0
10
20
30
40
50
E
PERCENT WATER ( V N ) Flgure 8.
Contour maos for a simulated six-comoonent mixture showing (A) resolution based on experimental capacity factors, (6) resolution
based on predicted capacity factors, and (C) A
-'
B.
though the predicted surface shows a smaller absolute resolution in the vicinity of the maximum than the experimental surface, the location of the optimum, in terms of the solvent axes, is virtually identical with that of the experimental result. Examination of the error contour surface (Figure 7C) reinforces this contention and, additionally, shows the excellent agreement associated with the regions outside the optimal range. The six-component simulation was based on data and predictions for the compounds p-nitroacetophenone, ethylbenzene, phenyl ether, benzonitrile, dimethyl phthalate, and p-nitrobenzaldehyde. As might be expected, more complex surfaces (Figure 8A,B) result from the greater possibility for elution reversal. The error surface (Figure 8C) shows a correspondingly more complex situation, the errors showing alternate signs in different portions of the surface. Despite the somewhat larger magnitudes (compared to the four-com-
ponent case) of the error in predicted resolution, the position of the global optimum was predicted with acceptable accuracy even in the six-component case. These preliminary studies of the predictive capability of factor analytical modeling have shown the potential value of the multivariate approach to the prediction of chromatographic behavior. Our FA model, as opposed to traditional regressive approaches, demonstrates that the information contained in an objectively selected, small chromatographic data set (limited to isocratic, binary solvents and few solutes) extends well beyond the directly observed behavior and, in some respects, adequately describes the complete range of useful reversed-phase chromatography for the solvent systems and solute classes studied. One should be cautioned, however, that our set of solutes in no way includes all possible types of compounds, and extrapolation of the method to other solute classes should be well tempered with validation by experiment.
Anal. Chem. 1989, 61, 375-382
Nevertheless, it is important to recall the severe restrictions on information incorporated into this predictive model; one might naturally expect more reliable predictions to be a result of either (a) using a larger number of characteristic solvents (and solutes) 01 (b) dowing ternary solvents to act 89 members of the basis set. Current work is under way to extend the predictive method to other column/solvent systems with the inclusion of more comdicated and varied solute classes. LITERATURE C I T E D Glajch, J. L.; Klrkland, J. J.; Squire, K. M., Minor, J. M. J. Chromatogr. 1980, 16,57. Snyder, L. R.; Dolan, J. W.; Gant, J. R J. Chromatogr. 1979, 765, 3. Sny&r, L. R.; Dolan, J. W.; Gant, J. R. J. Chromatogr. 1979, 165, 31. Quarry, M. A.; Grob, R. L.; Snyder, L. R. Anal. Chem. 1988,5 8 , 907. Koopmans, R. E.; Rekker, R. F. J. Chromatogr. 1984,285, 267. Jlnno, K.; Kawasaki, K. Chromatographia 1984, 18, 90. Bylina, A.; Gluzinskl, L.; Radwanskl, B. Chromatographia 1983, 77, 132. Wise, s.; Sander, L. C . HRC CC, J . High Resolut. Chromatogr. Chromatogr. Commun. 1985,8 , 248. Hanai, T.; Tran, C.; Hubert, J. HRC CC, J . High Resolut. Chromatogr. Chromatogr. Comrnun. 1981,4 , 454. Kaliszan, Roman Crit. Rev. Anal. Chern. 1986, 16, 323. Schoenmakers, P. J.; Billlet, A. H.; de Galan, L. J. Chromatogr. 1981, 218, 261. Schoenmakers. P. J.: Bllliet, A. H.; de Galan, L. J. Chromatogr. 1982, 282. 107.
375
(13) Schoenmakers, P. J.; Billlet, A. H.; de Galan, L. Chromatographia 1982, 15, 205. (14) Lo~muller, c, H.; Hamzavi-Abedi, M. Chu-Xiang, o, J , chromtogr. 1988,387, 105. (15) Malinowski, E. R.; Howery, D. G. factor Analysis in Chemistfy; Wlley: New York, 1980. (16) Malinowski, E. R. Anal. Chem. 1977,4 9 , 606. (17) Wold, S. Technometrics 1987,2 0 , 397.
!i:i
J;
~ ~ ~ ~ ~ , w ~ k ~ Lg8i: , ~ . R " lg87, ~ " 1, , 221, ~ ~ ~ ~ (20) HOWeN. D. G.: Soroka. J. M. J. Chemom. 1987. 1. 91. i21j weine;,'~. H.;'Howe&'D. G. i n e l . Chem. 1972;4 4 , 1189. (22) Kindsvater, J. H.; Weiner, P. H.; Klingen, T. J. Anal. Chem. 1974,46, 982. (23) Weiner, P. H.; Parcher, J. F. Anal. Chem. 1973,4 5 , 302. (24) Weiner, P. H.; Malinowski, E. R.; Levinstone, A. J. Phys. Chem. 1970, 7 4 , 4537. (25) Malinowski, E. R.; Howery, D. G.; Welner, P. H.; Soroka, J. R.; Funke, P. T.; Seizer, R. S.; Levinstone, A. FACTANAL, Program 320, Quantum Chem. Program Exchange, Indiana University, Bloomington, IN, 1976. (26) Martin, A. J. P. Biochem. SOC. Symp. 1949,3 , 4. (27) Katz, E. D.; Lochmuller, C. H.; Scott, R. P. W., submitted for publication In J. Chromatogr.
RECEIVED for review August 16, 1988. Accepted November 15, 1988. This work was supported, in part, by a grant from the National Science Foundation, Grant No. CHE-8500658 (to C.H.L.).
Influence of Sample Concentration and Adsorption Time on the Yield of Biomolecule Ions' in Plasma Desorption Mass Spectrometry A. Grey Craig* a n d Hans Bennich
Department of Immunology, Box 582, Uppsala University, S- 751 23 Uppsala, Sweden The yield of intact ions formed In 252Cfplasma desorptlon mass spectrometry has been investigated by analyzing melittin and bovine trypsln at dlfferent concentrations on a nitrocellulose surface. The yield of trypsin ions Is shown-to vary wlth the protein concentration and adsorption time. The singly charged ion of trypsin Is observed when concentrated solutlons ( I O pM to 1 mM) of bovine trypsin are applied for sufficient time.
INTRODUCTION Plasma desorption mass spectrometry (PDMS) ( 1 ) utilizes 262Cffission fragments as primary ions to bombard solid samples, producing secondary ions of the sample. The secondary ions are mass analyzed by their time of flight. Soon after its discovery, PDMS was recognized to enable intact desorption and ionization of labile molecules (2). The measurement of insulin, and thereafter a series of higher mass proteins including porcine trypsin, illustrated the potential for mass determination of biomolecules (3-5). Recently the measurement of porcine pepsin has demonstrated the ability of PDMS to ionize intact molecules above 30 kdaltons (6). Early measurements with this technique used the electrospray sample preparation method (3,where a solution of the sample is sprayed from a capillary, set at a high potential,
* Author to whom correspondence
should be addressed. This term refers to a charged molecule, but the resolution and mass accuracy of the instrument used are not sufficient to determine whether the single (or multiple) positive charge is due to the loss of one (or more) electron(s) or to the addition of one (or more) proton($. The same caveat holds for the notations M', Mz+,etc., in the figures.
onto a thin, grounded aluminum or aluminized Mylar foil. Alternative sample preparation methods have been developed to enhance the molecule ion yield. These include the use of insoluble matrices such as Ndion (8)and nitrocellulose ( l o ) , or coelectrospraying the anal* with a glutathione matrix (9). The (electrosprayed) nitrocellulose film is suitable for adsorbing large peptide and protein samples (10). A distinct feature of the spectra obtained when using a glutathione matrix or a nitrocellulose film is the presence of intense multiply charged molecule ions (9,lO). In the case of nitrocellulose this enhancement has been suggested as resulting from the formation of preformed ions ( l o ) ,while glutathione was proposed to decrease the electrostatic attraction of preformed ions, or alternatively enable desorption of neutral ion pairs which subsequently dissociate (9). The similarity between the spectra observed when using a glutathione matrix and a nitrocellulose film has been noted (9). The yield of intact ions formed by using PDMS together with the nitrocellulose sample preparation method has been investigated (10-13). Previously, the yield of doubly charged ions of bovine insulin was found to be less dependent on the amount of sample deposited on nitrocellulose, as compared to the yield of singly charged ions (13). In this paper the effect of sample concentration on the behavior of the singly and multiply charged ions of melittin and trypsin is investigated. MATERIALS AND METHODS Measurementswere made with a BIOION 20 mass spectrometer (Bio-Ion Nordic AB, Uppsala, Sweden). The instrument was operated at +20 kV accelerating potential; spectra were accumulated for lo6 primary ions for melittin and 2 X lo6 primary ions for bovine trypsin. The nitrocellulose foils were prepared as previously described (13). Spectra were calibrated by using
0003-2700/89/0361-0375$01.50/00 1989 American Chemical Society
~
,