Optimization in Irregularly Shaped Regions: pH and Solvent Strength

This region is searched with a set of uniformly spaced- informative experiments determined with the Kennard and Stone algorithm. The optimization proc...
0 downloads 0 Views 1MB Size
Anal. Chem. 1994,66,893-904

Optimization in Irregularly Shaped Regions: pH and Solvent Strength in Reversed-Phase High-Performance Liquid Chromatography Separations B. Bourgulgnon, P. F. de Agular, M. S. Khots, and D. L. Massart' VrJe Universiteit Brussel, Farmaceutisch Instituut, Laarbeeklaan 103, B- 1090 Brussels, Belgium

The feasible region in which experimental optimization can be carried out is often irregular in shape, so that the performance of the usual symmetric experimental designs is disappointing. This is illustrated by the optimizationof pH and solvent strength in HPLC. The definition of the feasible region is described. This region is searched with a set of uniformly spaceti informative experimentsdetermined with the Kennard and Stone algorithm. The optimization procedure is sequential, and the quality of each separation is judged by means of a multicriteria decision-making method. If none of the experiments leads to a chromatogram with an acceptable combination of optimization goals, then better experimental conditions are searched by following chromatographic expertise. The possibility of using Doptimal designs to model the chromatographic parameters over the feasible region is also discussed. As an example, pH and solvent strength of the mobile phase are optimized in the reversed-phase liquid chromatography separation of a mixture of four chlorophenols. Experimental design methodology has been introduced in analytical chemistry for many Conventionally in this field, in a very large majority of cases, this methodology has been restricted to the use of simple two-level factorial designs or more-level designs such as central composite designs or, recently, Doehlert designs.3~~ These spherical designs have much appeal because of their symmetrical and geometrically pleasing appearance. In practice, the performance of these designs is often disappointing. There are several reasons for that, namely: (a) Constraintsoccur in which a regularly spaced design will not cover an irregularly spaced region well. (b) A single criterion is optimized, while the optimization is inherently multicriteria in nature. (c) The model to be described is approximated with first- or second-order (quadratic) equations, while in reality a different model occurs. In analytical chemistry this is often the case when pH is involved, since sigmoid relationships are obtained that can be approximated by quadratic equations only in narrow variable ranges. (d) The rather large optimization time that occurs, as all experiments of the design have to be performed prior to modeling and searching optimal experimental conditions. We have tried to develop a different approach, which uses an experimental design philosophy with less emphasis on modeling. (1)Morgan, S.L.;Dcming,S.N.Anal. Chem. 1974,46, 1170-1181. (2)Morgan, S. L.;Dcming,S.N.J . Chromafogr.1975,112,267-285. ( 3 ) Bourguignon, B.; Marccnac, F.; Kcllcr,H. R.;de Aguiar, P.F.;Massart, D. L. J . Chromafogr.1993,628,171-189. (4) Hu, Y. H.; Massart, D.L. J . Chromafop. 1989,485,31 1-323. QQQ3-27QQ/O4lQ368Q803$04.5QJQ @ 1004 American Chemlcal Society

The point of departure of our reasoning is that the best chromatographic experts are able to perform the optimization of complex separations, and therefore it is advisable to follow the lead of thoseexperts. This means that one should formalize the way a chemist thinks and adapt chemometric techniques to it (and not vice versa). The way a chemist thinks about separations is very often a multistep procedure consisting of the following. Step 1: A definition of a region in which all acceptable solutions and, therefore, the best one must be found. In contrast with earlier publications in experimental design applied to analytical chemistry, we do not suppose that this is necessarily a rectangular or circular area. In practice, the geometry is much more complex. Step 2: A set of informative experiments to approach the optimum, know whether there is more than one such optimum (local versus global optimum), and have a feeling for the importance of parameters. Step 3: The results of step 3 lead to a conclusion based on the results of step 2 by the following. Step 3a: Narrowing down the region where acceptable solutions are found (informative approach). Step 36: Trying to develop a model that allows prediction of the optimum (modeling approach). Step 1 should take care of reason a for bad performance; steps 2 and 3 should take into account reason c. Reason b is taken care of by using a multicriteria decision-making (MCDM) procedure described earlier in ref 5 . Reason d may be avoided by steps 2 and 3a. In this paper, we shall focus on the informative approach. As an application to explain in detail this approach, pH and solvent strength were optimized in the reversed-phase highperformance liquid chromatography separation of a mixture of three tetrachlorophenols and pentachlorophenol.

THEORY Optimization Strategy. A general outline of the optimization procedure that is proposed in this paper is provided in Figure 1. Step 1: Defining a Feasible Region. Step 1 consists of defining the area whereall substances have acceptablecapacity factors (k') (Figure. 1). In what follows, we will call this the feasible region. This region often is far from spherical, as was the case in ref 3. The authors optimized the separation of the same mixture of chlorophenols by applying a Doehlert design. The feasible region they obtained and the experiments (5) Bourguignon, B.; Massart, D. L. J . Chromarogr. 1991,586, 11-20.

Analfllcal Chemlstty, Vol. 66,No. 6,Merch 15, 1994 893

I

Define a feasible region.

Region exists.

-

I

[Other experiments.

-

__ __-

Define a set of experiments In the region.

I

I

Informative approach.

L

;a7

y5+i Symmetrical design:

Logistlc hnsform.

Non symmetrical design: D-oplimal

Flguro 1. Qeneral outline of the optimlzatlon procedure described In thla paper. %

pHt

7 1

30

CHjCN i

40

I

50

60

7'0

-

% ACN

Flguo2. Exampk of an Irregular feardbb regbn. The pohrts constltute a W b r t &sign.

20 4.5

5.0

5 ,s

6 .O

6 ,5

7,O

PH

that were performed are shown in Figure 2. Only two of the seven experiments of the design belong to the feasible region. When applying a classical design, Le., a design with a fixed form such as a central composite or Doehlert design, two situations are possible. The feasible region may be large enough to insert such a design. This might in some cases result in excluding from investigation a considerable part of the area, as shown in Figure 3. When the feasible region is too small, the design will cover an area larger than the feasible region. This will result in shorter and larger retention times than desired for one or more compounds. To define a region, upper and lower limits of pH are selected by considering the PKa values of the compounds and the 094

A n e W a l Ctmmkby, Vd. 66, No. 6, M r c h 15, 1904

Flguro S. Exampb of a feaslbb retentbn region wlth two Doohkrt dedgnsI n M e d80 well as pomibb. Palma-g consthut@ one Doehkrt design (dottedIlnes);the other kslgn lo consmuted by polnb 1-7 (8olid liner).

stability of the stationary phase (Figure 4). If one wants to consider the pH as an experimental variable, one must of course make sure that pH limits include the PKa value of at least one substance. The following step is to construct a retention boundary map for each s~bstance.~ Therefore, at both limits of pH, volume fractions of the organic modifier are searched for each compound that should lead to the acceptable smallest and largest capacity factors, for instance 1 and 10.

I Step 1: defining a feasible region.

I

Define limits of pH : - pKa solutes -stability stationary phase Deflne limits of organic modifier concentratlon: . 2 gradient runs at each limit of pH ljrylab I plus - establish isocratic solvent strength for k’ = 1 and k’ = I O . Construct retention boundary map. Apply vertex algorithm to define a feasible region in the retention boundary map.

Step 2: defining a set of experiments.

I

Apply grid algorithm.

I Apply Kennard and Stone algorithm.

I

Check after each experiment whether the obtained result meets the performance goals b y applying the method of Derringer. Perform only a next experiment if this is not the case.

1 Step 3: narrowing down the feasible region. 1 acceptable separation then:

* apply chromatographic reasoning

* perform additional experiment[s] * apply method of Derringer Figure 4. Detained outline of the optimization procedure.

Two gradient runs at each pH are applied, scanning the possible solvent strength range, as proposed by Snyder.6 With the measured retention times for each compound, two isocratic mobile-phase compositions are calculated with the Drylab I Plus software7 to give solvent strengths at which k’ = 1 and k’= 10. For each compound,volume fractions of the modifier that lead to k’ = 1 at both limits of pH are connected by a straight line, and the same is done for k’ = 10 so that an experimental region with acceptable retention is defined first for each solute separately. Such a linear interpolation is only an approximation but was shown to establish an area with sufficient accuracy and with a minimum of experimental efforta3 A feasible region for the whole mixture is then determined by overlapping the regions of all compounds. If there is no region of overlapping, which may be due to large differences in the retention behavior of the compounds of the (6) Snyder, L. R.; Dolan, J. W.; Lommen, D. C.J . Chromatogr. 1989,485,6549. ( 7 ) Drylab I Plus, L. C. Resources Inc., Lafaycttc, IN, 1989.

mixture, a feasible region cannot be established for all compounds. In that case, one may consider applying a feasible region where retention is acceptable for most compounds of the mixture, or a different chromatographic approach should be followed (Figure 1). Consider for instance Figure 5, panels a-e. In Figure 5a, points 1 and 2 represent volume fractions of the modifier that should result in k’= 1 for a certain compound A at the upper and lower limits of pH. Points 3 and 4 are volume fractions that lead to k’= 10 at the two limits of pH. Points 1 and 2 and points 3 and 4 are connected by the straight lines a and d so that a feasible region is obtained for compound A (Figure 5a). In Figure 5, panels b-d, the pairs of lines b and e, c and f, and g and h delimit a region with 1 < k’< 10 for three other compounds ( E D ) . The feasible region for the four compounds that results from overlapping the four feasible regions is shown in Figure 5e. One should note the irregular shape of the resulting feasible region. To find the vertices describing the feasible region for all compounds, the following algorithm, which we shall call the vertex algorithm, was developed. This algorithm is provided in Figure 6 and proceeds in three steps. First, it is investigated whether there are vertices on the pH boundaries. This is done by searching at each pH limit the smallest volume fraction of modifier for any compound that leads to k’ = 1 and the largest volume fraction for any compound for k’= 10. If the smallest volume fraction for k’ = 1 is not larger than the largest volume fraction for k’= 10, then there is no modifier concentration at which all substances elute with 1 < k’ < 10. If it is larger, then these two concentrations are two vertices of the polygon. Let us consider Figure 5e as an example. The feasible region that results from four retention boundary maps (Figure 5a, panels a-d) is a quadrangle with, as vertices, points 1, 2, 5 , and 6 (Figure 5e). As at the lower limit of pH, the smallest volume fraction for k’ = 1 is due to substance D (point 4 in Figure 5e) and is smaller than the largest volume fraction for k’= 10 (point 3, due to substance A); it is decided that there are no vertices on this limit of pH. A similar reasoning indicates points 1 and 2 as vertices of the polygon at the upper limit of pH. In the two following steps, the other vertices, which are not on the boundaries of pH, are determined. This is based on a reasoning similar to McLean and Anderson’s extreme vertex algorithm for mixture designs.* One determines first all the points that might be vertices. These points are all intersection points between the limits of pH of the lines with k’ = 1 and k’= 10. In our example, points 5-9 are all intersection points between the limits of pH that have to be examined as possible vertices, In the last step, it is investigated which of these intersection points is indeed a vertex of the feasible region by applying the following constraints: for lines of k’ = 1

(8) McLcan, R. A.; Anderson, V. L. Technometrics 1966, 8,447454.

Ana&ticaIChemistty, Vol. 66, No. 6, Mrch 15, 1994

891

Yo CH3Ct4 I ,

‘0 C H 3 C N

ai

b

60. I 50

X X

LO

m sl

I

30.

20 ’

10-

7

4.5

4

7

4.5

PH

PH

L.5

PH

7 PH

YO modifier

I 50

X X

7,O

4.5 PH

Flgurr 6. Constructlon of a retention boundary map. Panels a 4 show feasible regions of different solutes (A-D). In panel e the feasible reglon that the four solutes have in common Is shaded. I and X Indicate capacity factors k’ = 1 and k’ = 10. 1-9 are pdnts mentioned In the text.

where j is the intersection point and i is the solute. for lines of k’ = 10

(volume fraction) of modifier(inte,section,oint)j 2 bOi+ bIi x PH(interwctionpoint)j

096 AMlytlcel C M k b y , Vd. 66,No. 6,March 15, 1994

(2)

Equations 1 and 2 are the equations of the lines that are applied to relate for each solute i the composition required for k’ = 1 and k’ = 10 respective to the pH. The maximum number of such constraints is twice the number of compounds. To investigate whether an intersection pointj is a vertex, the pH value of this point is substituted in the eqs 1 and 2 of all

11Determination of vertices on limits of pH

volume fraction for k' = 1 o are two vertices

An intersection point is considered as a vertex only if it fulfils the following constraints: for lines of k' = 1 volume traction modifier(puint) 5 b, + b, x pHi(point) for lines of k' = 10 ~ o l u r r etramon moditier(pointl 2 by tb51 pH(pointl

Flguro 8. Outline of the vertex algorithm.

substances, and the volume fractions of the modifier are computed. To belong to the feasible region, thevolume fraction of the modifier of an intersection point should be smaller than or equal to the volume fraction of the modifier that results for all lines of k' = 1 and larger than or equal to the volume fraction calculated for all lines of k' = 10. By doing this, it is investigated whether an intersection point is in the feasible region of each compound separately. If this is the case, then the intersection point is considered to be a vertex. In the example, eight equations are applied, namely, those of lines a-h. For intersection point 7 (pH = 5 . l , f = 38%), one finds that the constraint described by lined is not fulfilled; since at pH = 5.1 Ad) = 23%. Therefore point 7 is not a vertex. Only points 5 and 6 fulfill all the constraints and thus are vertices of the polygon. Step 2: Defining a Set of Experiments. In step 2, a set of experimental points is mapped in the feasible region. This offers the advantage over designs specifically for rectangular or spherical regions that no experiments fall outside the feasible region. The number of experiments is the number that the analyst is willing to perform. This is not always the case when applying experimental designs! Before defining experiments in the feasible region, it has to be made clear which experimental conditions might be selected as possible points. Therefore, the whole feasible region is covered by a grid of 0.1 pH unit and 1% volume fraction of the modifier. An algorithm, which we shall call a grid algorithm, is applied to investigate which points of the grid, constrained by maximum and minimum values of pH and volume fraction of organic modifier of the feasible region, fall inside this region. This

is done with the equations defining the boundaries of the feasible region, which are some of the eqs 1 and 2 used in the vertex algorithm, namely, those that connect the eventually retained vertices. If there are two vertices on the same limit of pH, eq 3 or/and 4 also is/are required respectively for the upper or/and lower pH limit: pH Imaximum value of pH (3) pH 1 minimum value of pH (4) All possible pH values in between the smallest and the largest pH of the feasible region are substituted in these equations. To belong to the feasible region, thevolume fraction of the modifier of the considered point should be smaller than or equal to the volume fraction calculated for boundaries corresponding with k' = I and larger than or equal to the volume fraction obtained with the equation of a boundary for k'= 10. A point is part of the feasible region only if it fulfills the constraints imposed by all sides. Once the candidate points are defined by the grid algorithm, one selects the number of experiments one accepts to carry out. We wanted a "design" with the following properties: Sequentielity. A sequential approach offers the advantage that not necessarily all experiments of the design have to be performed; if a sufficiently good response is obtained for one experimental point, one can stop the optimization. The bestknown sequential approach is the simplex optimization p r ~ c e d u r e but , ~ ~in~this ~ ~ procedure, changes in the elution order may occur and lead to the local rather than to the global optima. (9) Spendlcy, W.;Hext, G. R.;Himworth,

F. R. Technomeirks 1%2, I , 441.

Analytcal Chemlstty, Vol. 66, No. 6, Wrch 15, 1994

091

Mapping. Compared to the simplex, where often only a small part of the feasible region is investigated, the design that is applied here maps the feasible region completely and efficiently. The experiments occupy different locations in the region and, consequently, contain different information. This fits in what is called an informative approach. A set of experiments was selected with the uniform mapping algorithm of Kennard and Stone.10 The resulting set of points is called only a Kennard and Stone design for convenience. It is a design in the sense that it yields a set of experimental points. A model using observations obtained at these points can be obtained (see step 3b), but in the informative approach (see step 3a), the points are not selected to obtain an optimal estimate of coefficients of the model or to achieve optimal prediction. The Kennard and Stone algorithmlo is sequential and consists of maximizing the Euclidean distances between the newly selected point and the already selected points. This distance is calculated with the following equation:

where I = 1-k identifies the variables (here k = 2) and i and j identify the two points between which the distance is measured. An additional experiment is selected by computing for each point i , which is not selected yet, the distance to each selected point i and by maximizing the distance to the closest point that is already selected:

daclCcttd = max(min(dJ)

(6)

As the algorithm of Kennard and Stone uses distances, effects due to different scales of experimental variables should be considered. This is done by transforming all pH values and volume fractions of the modifier with intervals, respectively, of 0.1 pH unit and 1% to the same scale between 0 and 1. Therefore, maximum and minimum values of pH and the concentration of the organic modifier of the feasible region are searched. The minimum values of each experimental variable are transformed to 0. The maximum value of the experimental variable with the largest number of units in the feasible region is transformed to 1. Suppose minimum and maximum values of pH and the volume fraction of modifier of a feasible region are respectively 4 and 7 and 30 and 45%. pH 4 and 7 are respectively transformed to 0 and 1 as pH is thevariable with the largest number of units. 30 and 45% are transformed to 0 and 0.5. As an example to illustrate the algorithm of Kennard and Stone, consider Figure 7. The first two experiments to be selected are the two vertices that are connected by the largest distance, namely, points 1 and 2. From these two points, distances to all points of the grid inside the polygon are calculated. From each pair of distances, namely, between point 1 and a possible point and between point 2 and the same possible point, the smallest distance is selected. The third point to be selected is the one with the largest smallest distance that is possible in the feasible region. This would be point 3 in Figure 7. Calculations are started over again, now using three instead of two selected points, until 10 points have been selected. (10) Kennard. R.W.; Stone, L. A. Technometrim 1969,II, 137-148.

890 AmWCal C%em&t?y, Vd. 66, No. 6, March 15, 1994

45

5

5.5

6

65

7

PH

Flgurr 7. Experimentsof the Kennard and Stone design in a feasible region for the mixture of tetra- and pentachlorophenois.

Step 3: Narrowing Down the Feasible Region. Once a set of experiments is generated, either an informative approach or a modelingone may be followed (Figure 1). Conventionally, an optimal combination of pH and volume fraction of organic modifier is selected by modeling.' 1-14 Simple empirical linear13 and quadratic equations3may result in a considerable lack of fit when not working in a very restricted pH region. More accurate models such as nonlinear models by Schoenmakers et al.11J2and by Marques and Schoenmakers14lead to good results, but require an extensive amount of data. Moreover, as the nonlinear models are based on a developed complete theory, they only apply when that theory applies and may lack generality. It was, for instance, shown for nonlinear models that a model that yields the best precision of prediction on one type of stationary phase may not do so on another similar phase.I4 Most probably this conclusion applies also for linear models and thus for any model in general. Consequently, the selection of a convenient model for a particular separation is often not obvious, and relatively few tools are available for doing this.I5 Therefore, in step 3 of the approach we propose, acceptable solutions may be found in an informative way, without modeling data. An option for modeling also is included, but here the described disadvantages have to be taken into account (Figure 1). Step 3a: Informative Approach. The experiments of the design are performed in the order found by the algorithm of Kennard and Stone, and after each experiment it is investigated whether the separation meets the initially defined performance goals. (1 1) Schocnmakers, P. J.; van Molle, S.;Hayes, C. M. G.; Uunk, L. 0. M. Ana/. Chim. Acta 1991, 250, 1-19.

(12) Schocnmakers, P. J.; Mackie, N.; Marques, R.M. L. Chromarographia 1992, 35, 18-32. (13) Snyder, L. R.J . Chromatogr. 1992, 592, 183-197. (14) Marques, R. M. L.;Schocnmakers,P. J.J. Chromatogr. 1992,592,157-182. (IS) Zupan, J.; Rius, F. X . Anal. Chim. Acra 1990, 239, 311-315.

We were interested in optimizing simultaneously the degree of separation, the analysis time, and the symmetry of all peaks in the chromatogram. The minimal resolution, RAn,and the retention time of the last peak, k:, were selected as global optimization criteria to quantify the former two performance goals. As far as we know, no such global optimization criteria have been described for the symmetry of the peaks. The symmetry of one peak is expressed as the agymmetry factor (A), which is the ratio of the leading half of the peak to the trailing half, measured at 10% of the peak height. As asymmetry factors larger and smaller than 1 are considered as less desirable than A's that approach 1,one cannot calculate the sum or the product of asymmetry factors of all peaks in the Chromatogram. To eliminate this drawback, we propose to transform all A's to a scale between 0 and 1 by means of Derringer's desirability f ~ n c t i o n . ~A J ~two-sided transformation is needed, represented by the following equation:5

if y1< Y/-) or y1> Y/+)

0

where Fc-1 is the minimum acceptable value of criterion Yl and Y,(+)is the value beyond which improvements would serve no useful purpose. Both have to be selected by the user. cyis a target value that can be selected anywhere between Y,(-) and Y/+)and is set equal to 1 in this case. A's smaller or larger than 1 will rapidly lead to 0 while A's close to 1 will be transformed to a value close to 1. Working with the transformed A values allows us to judge the quality of the symmetry of all peaks in the chromatogram by combining A's of individual peaks into a global optimization criterion, such as the sum or the product. For an ideal chromatogram, the former should equal the number of peaks and the latter should equal 1. It was decided to apply the sum of transformed A's, which is called here Aglobal. To optimize simultaneously Rdn, k:, and Aglobal a multicriteria decision-making method (MCDM) is required. The method of Derringer, which was shown to be convenient in RPLCIS is applied to each obtained chromatogram. The method is based on the transformation of measured properties to a dimensionless desirability scale between 0 and 1. As a result, values obtained from different scales of measurement may be combined into one global criterion, which is called the overall quality. The overall quality is the geometric mean of the desirability values of the individual criteria. Individual criteria were transformed into desirability values by a onesided transformation,s which for R,,,inand Agloklis represented as follows: 0

if Y,IY/-)

1

if Y, 2 Y/+)

(16) Derringer, G.; Suich, R. J . Qual. Technol. 1980, 12, 214-219.

and for k:, which has to be minimized

0

if Y, 2 Y/+)

1

if Y, 5 Y,(-)

As the value of the overall quality increases with increasing desirability values, the method allows us to investigate whether the separation meets the postulated optimization goals. If this is the case, then the optimization procedure is stopped, else the next experiment is carried out. If none of the experiments of the design results in an acceptable separation, additional experiments are carried out, e.g., by investigating changes in the elution pattern in function of the experimental variables. Step 3b: ModelingApproach. When relying on a model, the usual design to be applied when there are constraints preventing the use of a spherical or rectangular design is a D-optimal design. A design is called D-optimal compared to other designs with the same number of experimental points if it has the smallest determinant (D) of (XtX)-I, where X is the matrix of parameter coefficients for the model and Xt is the transposed matrix of X.17J8 The matrix X of parameter coefficients is based on the values of the experimental variable and the specification of the model. It is derived from a model, which is linear in the unknown parameters, as follows: y=Xb+e (9) where y is a vector (n X 1) that contains all the observed responses, X is the matrix of parameter coefficients (n X p), b is the vector (p X 1) of coefficients of the model, and e is a vector (n X 1) of errors representing the difference between the observed responses and the predictions by the model. In optimization, the model is usually quadratic. For two variables x1 and x2:

where x1and x2 are the experimental parameters. Rewritting the model as estimated reponse leads to the following equation: y = bo

+ b,x, + bzx2+ bllx12+ b2,x,2 + b12x1x2(1 1)

where the b's are estimates of the Ps. X is then given by

where n is the number of experiments. It can be shown that the estimation of the b coefficients is best when D of (XtX)-l is smallest. The selection of the set of experiments yielding the smallest D is an iterative procedure. Once one has decided on the number of experiments to be carried out, an initial configuration is defined which is the start of the iterativeoptimization procedure. Often one uses for that purpose an algorithm such as that of Galil (17) Deming,S.N.;Morgan,S.L.ExpcrimentalDesign: AChcmometrfcApptvach, 2nd d.; Elsevier: Amsterdam, 1993. (18) Box,G. E.P.;Draper,N. R. EmpirlcalModclBulldlngandResponscSu~~es; Wilcy: New York, 1987; p 490.

AM-1

Vd. 66, No. 6, lclbrch 15, 1994

099

and Kiefer.19 We decided to use Kennard and Stone’s algorithm to select an initial set of experiments and applied afterwards a variant on the algorithm of Wynn20 for the iterative procedure itself. The latter algorithm is the simplest algorithm described. In a later stage of our research, we will investigate whether we will replace it by the algorithm of Fedorov2l or not. Reports in the literature about which algorithm is computationally more efficient are conflicting.22 The difference between the Kennard and Stone approach and the D-optimal design is that the latter depends on a model and aims to optimize the estimation of the parameters in the models while the Kennard and Stone design aims at mapping in an informative way the experimental area. However, one can of course use the set of experiments generated with Kennard and Stone’s algorithm instead of the D-optimal set of experiments to obtain a linear model or a nonlinear one (Figure 1). For instance, applying the Kennard and Stone algorithm in the feasible region seems a more reasonable way to select the measurements required for Schoenmakers’ nonlinear model12 than the 3 X 4 experimental design applied by this author. It should be noted immediately that the quadratic model will only be suitable when the sigmoid relationship between log k’and pH is not included completely in the experimental area because sigmoid surfaces cannot be modeled with quadratic equations. The sigmoid shape will not be contained completely in the feasible area if the pH range is small enough. When it is not, one may need another approach based on the transformation of the data to obtain a response surface that is more amenable to the usual experimentaldesign approaches with quadratic models. This transformation may be carried out with the logistic transform. We are currently investigating this. EXPERIMENTAL CONDITIONS Chromatographic Equipment and Parameters. A Merck Hitachi liquid chromatograph provided with a Rheodyne injection valve (50-pL sample loop) was used to carry out the HPLC measurements. Detection was performed at 260 nm with a Shimadzu SPD-2A UV detector. The attenuation was set at 0.05 AUFS. The chromatograms were recorded with a SpectraphysicsSP 4290 integrator. A pH Stable Spherisorb S5 PC18 (20 X 0.46 cm) (Euro-Scientific) with a particle size of 5 pm was used. All separations were carried out at 30 OC and with a flow rate of 1 mL/min. pH of buffer solutions was measured with an Orion Research digital ionalyzer. Standards and Reagents. All chlorophenols were of reference grade and were obtained from Aldrich. The names and pKa values of the chlorophenols are listed in Table 4. Standard solutions in mobile phase with concentrations of approximately 100 mg/L were daily prepared by dilution of stock solutions in acetonitrile. HPLC-grade acetonitrile, sodium dihydrogen phosphate, phosphoric acid, citric acid, trisodium citrate, and sodium hydroxide of pro analysis quality were obtained from Merck (Darmstadt, FRG). The mobile phase was prepared from acetonitrile (ACN) and aqueous (19) Galil, Z.; Kiefer, J. Technomerrics 1980, 21. 301-313. (20) Wynn, H. P. J . R.Srar. Soc. 15V2, 8-31, 133-147. (21) Fcdorov, V. V. Theory ofOprimol Experlmmts;Academic Press: New York,

1972.

(22) Johnson, M. E.; Nachtsheim, C. J. Technomerrics 1983, 25,

271-277.

BOO Analytical Chemktry, Vol. 66, No. 6, mrch 15, 1884

buffer. Buffers were (1 / 1) stoichiometric mixtures of citrate and phosphate buffers with a total ionic strength of 0.05 mol/ L. They were prepared with water, purified in a Milli-Qsystem, and filtered through a 0.45-pm Millipore filter under vacuum. Procedures. Retention times were provided by the integrator. Peak widths and asymmetry factors (A) were measured manually at respectively half and 10% of the peak height. The number of experiments was reduced by chromatographing in one run several standards with large enough differences in retention. pH values were measured prior to mixing with the organic modifier. The holdup volume of the column was determined by injection of the mobile phase enriched with ACN. The gradient delay volume was determined with a linear gradient from 0%to 1% of ACN in MilliQ-water in 10 min. To have an idea of experimental error and to trace possible column degradation, the first performed experiment was repeated after finishing a series of experiments of a design. When mobile-phase pH was changed, the column each time was equilibrated for 2 h with the next mobile phase at a flow rate of 1 mL/min before injecting samples. Software. The Drylab I Plus software (L.C. Resources Inc., Lafayette) was used to derive isocratic mobile-phase compositions needed toconstruct the retention boundary map. This software allows isocratic simulations with an accuracy better than 5% starting from two initial gradient scans.7 For performing the rest of the calculations, two programs called Doehlert and K&S were written by the authors. The program Doehlert was already de~cribed.~ It calculates the model parameters in eq 10 for each compound and predicts retention times and peak widths for a large set of experimental conditions. The program K&S is written in Microsoft QuickBASIC 4.5. It defines the feasible region of all compounds separately starting from the mobile phase compositions obtained from the Drylab I Plus software. The program proceeds by defining the vertices of the feasible region for all compounds and selects up to 10 experiments in that region by applying the algorithm of Kennard and Stone. The program D-optimal is written in PC-Matlab and calculates the experiments of a D-optimal design starting from the experiments of the Kennard and Stone design. All programs were run on a IBM Personal System/2 Model 80. RESULTS To construct the retention boundary map in step 1, two gradient runs with the same initial and final concentrations of acetonitrile, namely, 10 and 50%and 25 and 8046, were run at pH 7 and 4.5, the upper and lower limit of pH. The concentrations for isocratic elution at k’= 1 and 10 for each compound calculated with the Drylab software were entered in the software for the vertex algorithm so that vertices of a feasible region with k’ values between 1 and 10 for all compounds wereobtained. This region is a quadrangle (Figure 7). Concentrations of acetonitrile are smaller at pH 7.0 because, compared to pH 4.5, all compounds are ionized, more polar, and thus less retained by the stationary phase. In step 2, the possible experimental points in the region defined in step 1 are searched with the grid algorithm to select afterwards a set of experiments with the Kennard and Stone algorithm. In our example, the number of experiments of the Kennard and Stone design is set at seven to fit a quadratic

Tabkl. c o n d w o n r t w ~ o l K o n n a r d a n d W a m hr pll R W 4.5-7.0 expt pH % ACN expt pH % ACN

1 2 3 4

64 20 32 34

4.6 7.0 5.0 6.4

5

0

min

5 6 7

I

0

6.4 4.5 6.2

5

44 41 25

c 10 min

c

d

Tabk 2. Exp.rknontal Valuoa of moo Opthnlzatbn Crttorlr and d Ovorall OuaIHy Crkorlon experiment Rmin t d Adob.1 D 1 0.38 4.67 2.6 0.4 2 22.90 0.00 2.1 0.0 29.49 3 0.30 2.1 0.0 7.93 4 0.00 1.6 0.0 6 0.56 7.01 1.4 0.4 0.10 10.01 6 0.0 1.6 27.50 7 0.98 1.9 0.0 pH = 4.9,48% ACN 0.79 5.96 3.4 0.8 pH = 5.1,47% ACN 0.40 5.36 2.3 0.4

guess. This results from assuming a linear relationship of retention in function of pH in that step. One should decide in step 3 to perform regression or to apply an informative approach. The maximal pH span of the feasible region is about 1.9 pH, at constant volume fraction of acetonitrile. For most solvent strengths it is much less. Consideringthe pH range of the feasible region,the pK,values of the compounds, and previous result^,^ it is probable that the feasible region does not include the complete sigmoid and therefore that modeling with quadratic models may yield acceptable results. Both the modeling approach with a linear model and the informative approach will therefore be investigated. In the informativeapproach, the feasible region is narrowed down by following the way of thinking of a chromatographic expert. An expert would search for changes in the elution order to improve the selectivity. If the peaks of the same two solutes are not well-separated at experimentalconditions that are close, then one may be almost sure that the separation will not be better at conditions that are in-between the two experiments. Therefore, changes in elution order are investigated between each experiment and the surrounding nearest experiments. The elution order in experiment 1, for instance, is compared to the order obtained in experiments 5 and 6;for experiment 4, a comparison is made with the order in experiments 5,3,7, and 2 (Figure 7). Changes in the elution pattern occur between experiment 1 and 5 and between experiments 4, 7, and 2. Experimental values of the three optimization criteria for these separations are given in Table 2. It was shown by principal component analysis (PCA)23 that for this application &in, Agl0bp1,and the analysis time show relatively little correlation, Le., a significant amount of information is lost when working only with two of the three criteria. The three optimization goals were considered as equally important by setting r = 1 in eq 8. This leads to an overall quality, D,of 0.4for the chromatograms of experiments 1 and 5 (Table 2). The chromatograms of experiments 2 and 4 are considered unacceptablebecauseof a pair of unresolved peaks, namely, the peaks of 2,3,5,dtetrachlorophenol and pentachlorophenol and 2,3,5,6-and 2,3,4,6-tetrachlorophenolshow a complete overlap. The chromatogram of experiment 7 too leads to a D value of 0 due to the large analysis time and the considerable asymmetry of mainly the last peak. As the experimental conditions of experiments 1 and 5 resulted in more desirable chromatograms,it was investigated

e

i tb is 20 2'5 io mm Chromatogram of the mixture obtained at the experhemu 1 (A), 5 (B), and 7 (C) 1e-( condltioni of rxp6rlments 1, 5, and 7: 8ee Tabb 1; 1) = penta&k"ol, b = 2,3,6,&tetrachlow , C = 2 , 3 , 4 , e ~ , d = 2 , 3 , 4 , 5 - t ( r b a c ) . 0

Fleuv 8.

model to the data with 1O of freedom for the residuals, since at least three levels for each variable are present in the design. The seven experiments of the design are indicated in Figure 7 and listed in Table 1. All experiments are located close to the border of the feasible region. Four peaks were obtained only in the chromatograms of experiments 1,5, and 7 (Figure 8), but none of these separations can be considered as acceptable for all three criteria (Table 2). The chromatogram of experiment 1 is better for the analysis time and symmetry of all peaks than the one of experiment 7, but the Rmin is much worse. The R ~ ,of,the chromatogram of experiment 7 is quite satisfying. However, theanalysis time exceeds a k'of 10. At first sight, this should not happen because we defined the feasible region as having 1 < k'< 10. However, as stated earlier, the region is a rough

(23) Wdd, S.;Esbensen, K.;Geladi, P.Chcmom.Intell. Lab. Syst. 1987,2,37-52.

Tabk a. conditknr for Exp.rhnntr ol Doahiod DOalgn and Konnard and S t w D . r k n

-

experiment

pH

1 2 3 4 6 6 7

4.6 6.8 7.0 6.1 6.1 6.4 6.4

Doehlert % ACN

pH

K&S %ACN

4.6 7.0 4.9 6.2 6.3 4.6 6.1

36 36 36 26 46 26 46

62 20 30 32 42 38 24

Tabk 4. Comprrkon of A o o u r ~ yof PNdlOtod ReIontb~~ Tlnm, for Exporhnonb of Daehlwl D n b n and K m a r d and Stom D.rlgn*

I

-

0 5 10 min fi@uro9. Chromatogram of the m W e o b t a W at the exprlmntal conditkm; pH = 4.9, concentration of acetonltrlk = 48% (a = pOntaohlorOphenOl, b = 2,3,6,6-tettachlorop~nol, c = 2,3,4,& tetrachlorophenol, d = 2,3,4,5-tetraohlorop~nol).

whether these results still could be improved by performing an experiment at conditions that are in the middle of the conditions of experiments 1 and 5 , for instance, at pH 4.9 and with 48% ACN (Figure 9). The analysis time obtained at these conditionsis almost the average of the one of experiments 1 and 5, but R,i, and the symmetry are much better so that for this chromatogram a D value of 0.8 is obtained (Table 2). It might be that still better chromatograms are possible close to these conditions. The separation at pH 4.9 meets the performance goals we have postulated so that the optimization procedure is stopped. Better RPLC separations of the same mixture and also achieved by optimizing pH and solvent strength were reported in the literatures3 As the authors observed a considerable degradation of the traditional silica phase they were working on, in this application a pH-stable phase was selected. Differences in separation quality are therefore due to differences in efficiency and selectivity by the stationary phases. The other possibility is to use a modeling approach, either with the D-optimal design or with the Kennard and Stone selected experiments. The D-optimal solution for seven points was close to the Kennard and Stone so that it did not seem useful to carry out the D-optimal experiments, and the modeling was performed with the Kennard and Stone experiments. A second-order model was fitted to the experimental data of the seven experiments. For the separation at pH 4.9 this led to errors in the prediction of retention times of at least 20%. This probably is due to a too large feasible region including more than one leg of the sigmoidal curve of retention versus pH. The precision of predictions obtained with the Kennard and Stone (near D-optimal) design with seven experiments is compared to the precision obtained with a Doehlert designa4 The latter necessarily has one or more experiments out of the feasible region. The feasible region differs slightly from the 902 AM-1

-by,

Vd. 66, No. 6, AAerch IS, 1904

nameb

pK.

2346 2356 2346 penta

6.22 6.02 6.64 4.74

K&S ARD(%) SSQ 6.6 6.8 6.0 6.2

0.6 0.7 4.6 1.2

Doehlert ARD(%) SSQ 36.6 29.4 39.3 8.0

-

!

24.2 13.7 70.9 1.0

0 Experimental conditione: see Table 3;SS = sum of aquared deviationsbetween calculated and ex rimental, ata;ARD aver e relative deviation between dcula&d expenmental data b C f k rosubtituent positions.

Tabh 1. Camparbon of Accuracy ol Prdktod ReIontbn Tlnm for Thrw Addltlond Exprrlnwnk In Co.rl#. R o g b wlth Doohiorl D d g n and Kennard and Stom Dodgn experiment PH % ACN

1 2 3

namea 2346 2366 2346 penta

6.8 6.4 6.6

K&S ARD(%) 8.7 13.0 24.2 20.0

SSQ 1.6 1.9 122.0 6.8

26 41 30

Doehlert ARD(%) SSQ 43.6 66.6 27.6 43.8

46.6 46.8 124.3 37.7

Chlorosubstituent positions.

one in Figure 7 as experiments were carried out on another similar stationary phase (same type and manufacturer). The experimental conditions of the designs are listed in Table 3. Experiments 3 , 7 , and 4 of the Doehlert design fall outside so that capacity factors smaller and larger than desired are obtained (Figure 1). In experiment 7 of the Kennard and Stone design, the capacity factors were larger than 10, as was explained earlier. To compare the effectiveness of both designs, the sum of squares of the absolute deviations (SSQ) and the average relative deviation (ARD, in percent) are calculated for retention times of each compound and for the experiments of the design (Table 4). SSQ and ARD are remarkably smaller for the Kennard and Stone design for all compounds, which indicates the modeling quality to be much better. The precision of predictions was examined with three additional experiments that differ from the experiments of the designs (Table 5 ) . As could be expected, SSQ and ARD are much larger as compared to the SSQ and ARD for the experiments of the design (Table 4) but again are significantly smaller for the Kennard and Stone design.

% CH3CN

Table 7. Comparkon ol Accuracy of Predicted R.1.ntkn Tlmoa tor Exporlnmts of DoWo.rt Ddgn and KonnMd and Slam D.rlgn In pH Roglon 4.1-5.5.

I

KLS

Doehlert

compd

ARD(%)

SSQ

AFtD(%)

SSQ

2346 2356 2345

0.2 1.0 8.5 0.8

0.0 0.0 9.3 0.0

4.7 0.5 6.2 2.1

1.1 0.0 4.8 0.1

penta

I, Experimental conditions, see Table 6.

PH

Flguro 10. Experiments of the Kennard and Stone design and the Doehlert design fitted in the reduced feasible region. Tablo 6. CondHlon8 for Exp.rhn.nt8 of Doohlorl D d g n and Kennard and Stone Dodgn In pH Rogbn 4.5-5.5

Doehlert

K&S

experiment

pH

%ACN

pH

% ACN

1 2 3 4 5 6 7

5.0 5.0 5.0 4.5 4.5 5.5 5.5

29 37 45 33 41 33 41

4.5 5.5 4.5 5.5 4.8 4.8 5.4

52 27 37 41 45 30 34

As the same model was applied in the Doehlert design and in the Kennard and Stone design, the much better accuracy of predictions obtained with the latter design is due to the location of the experiments in the feasible region. However, if the feasible region is reduced so that a Doehlert design fits in without excluding from investigation an important part of the domain, it might be that the precision of predictions becomes as good as with the nonspherical design. To investigate this, the pH region was reduced to 4.5-5.5 (Figure 10) and the precision of predictions with a Doehlert design fitted in this region was compared to the precision obtained with experiments selected by Kennard and Stone. The experimental conditions of the two designs are represented in Table 6. For both designs, ARD and SSQ values are now much smaller than when working in the whole feasible region (Table 7). This is due to the smaller parameter space, which assures that for all solutes only one (part of the) "leg" of the sigmoidal curve of retention versus pH is observed so that the data are better described by the quadratic model. Differences of ARD and SSQ values between the two designs also are considerably smaller, which might be explained by the Doehlert design being better spaced throughout the new reduced feasible region. If the Doehlert design can be fit in well in the feasible region, and one wants to apply modeling, it is recommended to prefer it to a Kennard and Stone selection because of its central experiment. A central experiment often is absent in

nonspherical designs with a small number of experiments so that no information of the chromatographic behavior of the compounds inside the feasible region is offered. An essential requirement when modeling and performing predictions is that the results have not changed when experiments are repeated. It is known that the degradation of the stationary phase may cause the results not to be constant and, thus, decrease the precision of prediction in experimental design. Though the stability of polymer-coated silica phases and the reproducibility of results were shown to be superior compared to traditional silica phases and polymer phases, during chromatography, column performance should be continuously monitored. Therefore, the first experiment was repeated after each series of experiments of a design. Differences in retention times for the Doehlert design and the Kennard and Stone design were of the order of 0.596, which is not larger than normal variation between different days. CONCLUSION One of the first steps of an optimization should be the definition of a region in which experiments yield relevant results. Experiments are limited to this region, thereby avoiding experiments at locations that are not useful from a practical point of view and for some substances lead to measuring at very low or very high k'. If the feasible region is not symmetric (and it very rarely is), several approaches are possible. One of them is to map the feasible region. For this purpose the algorithm of Kennard and Stone can be applied, as it is expected to provide a uniform spacing of experiments. Modeling is also possible. D-optimal designs or the experiments selected with Kennard and Stone can be used. The example we have chosen has the added difficulty that nonlinear models should be preferred to quadratic models, except when the pH range is small. A design based on Kennard and Stone offers the advantage that it can be used in combination with all kinds of models including nonlinear models. It may well be that the combination of a retention boundary map to determine a feasible region, the Kennard and Stone algorithm to select a set of experiments, and Schoenmakers' nonlinear model will prove to be the best for the simultaneous optimization of pH and solvent strength when performing a modeling approach. If a classical design fits in the feasibleregion without leaving uninvestigated a great part, then it may be preferred to applying a design based on Kennard and Stone or a D-optimal design. In the last step of the informative approach, the method of Derringer is applied to select among a series of chromatograms the one@) that meet(s) best the performance goals. If none of the experiments of the design results in a separation, Anelytlcal Chemistry, Vol. 66, No. 6, March 15, 1994

905

acceptable for the different optimization goals, it is investigated whether there are peak reversals. Additional experiments are then carried out by chromatographic reasoning. In this way, the feasible region is restricted in a sequential way, and each step is based on the chromatographer's expertise. One can argue that eight total experiments were performed to achieve the final acceptable chromatogram in the example. However, the optimization is sequential. This means that not necessarily all experiments have to be carried out; no further change in separation conditions is required if one of the obtained separations meets the goals of the HPLC method.

SO4

AnaLktical Chembby, Vol. 66, No. 6, March 15, 1994

ACKNOWLEDGMENT B.B. and D.L.M. thank the National Fund for Scientific Research and P.F.deA. thanks Conselho Nacional de DesenvolvimentoCientifico e Tecnologico (CNPq) for financial assistance. The authors thank K. Decq for skillful technical assistance. Recelved for review June 9, 1993. Accepted December 8, 1993.' Abstract published in Aduance ACS Abstracts. February 1, 1994.