Target transformation factor analysis with linear inequality constraints

Department of Chemistry, East Carolina University, Greenville, North Carolina 27834. A new algorithm for performing curve resolution of overlapped...
1 downloads 0 Views 951KB Size
2656

Anal. Chem. 1986, 58, 2656-2663

Target Transformation Factor Analysis with Linear Inequality Constraints Applied to Spectroscopic-Chromatographic Data Paul J. Gemperline Department of Chemistry, East Carolina University, Greenville, North Carolina 27834

A new algorithm for performbrg curve resolution of overlapped chromatographic peaks is presented that uses target transformatlon factor analysis with automatically generated linear inequality constraints in the concentration domain. The constraints are used to yield elution profiles that are nonnegative and have slngle maxima. No assunptions are made regarding the shape of the elution profiles. The method Is not limited to systems of two or three overlapped components, and an example is given showing the resolution of four overlapped peaks.

The problem of ascertaining peak purity in chromatography is important in analytical chemistry. All qualitative and quantitative analyses that use chromatography rely on pure peaks for accurate results. Contaminants in the form of overlapped peaks will adversely effect the quality of the analysis. Often, peak overlap can be so severe that no obvious evidence of contamination may be present. Two recently published papers have indicated that the occurrence of overlapped peaks may be much more prevalent than previously expected (1,2). Techniques to detect the presence of overlapped peaks and perform curve resolution will therefore greatly increase the reliability of many analyses and will be of vital interest to all chromatographers. The first application of factor analysis to perform curve resolution was described in 1971 (3). Since that time, it has been successfully applied to resolve overlapped peaks in gas chromatography/mass spectrometry (GC/MS) data ( 4 , 5 )and liquid chromatography/ultraviolet (LC/UV) data (6, 7). Prior to curve resolution, principal component analysis (PCA), a form of factor analysis, should be used to detect the presence of overlapping peaks. Several papers have shown PCA to be an ideally suited tool for detecting the presence of overlapped chromatographic peaks from multichannel detectors (8-11). Malinowski developed criteria for determining the number of principal components during the factor analysis of a data matrix (12,13) and then used it to determine the number of components in unresolved liquid chromatographic fractions measured with ultraviolet spectroscopy (LC/UV) (11). Kowalski et al. were able to resolve overlapped peaks from GC/MS data for binary mixtures and later modified their method to perform curve resolution in liquid chromatography using multiwavelength ultraviolet-visible detection (LC/UVvis) (5, 6). Both methods estimate the spectra of the pure components in regions of minimum component overlap through the application of inequality constraints. The main advantage of their method is that no assumptions are made regarding the shape of the elution profiles Most recently, an extension of their method to perform resolution of three overlapped components was reported (14). The technique was demonstrated using simulated data. Further studies showing the resolution of experimental data may prove the extension to three components to be valuable. The method of Hwang and Chen (IS),which was later modified by Vandeginste et al. ( I @ , was used to perform curve

resolution of up to three overlapped peaks in both GC/MS data and LC/UV-vis data, respectively. Initial estimates of either the pure spectra or elution profiles of the underlying components were produced through the application of constraints to prohibit negative absorbances or negative elution profiles. A new, third constraint was formulated to select the simplest spectra or simplest elution profiles from a band of feasible solutions. The term “simplest” was defined mathematically as those curves that had the smallest area-to-norm ratio. In the case of simplest elution profiles, the third constraint has the advantage of selecting the sharpest peaks. Meister reported a method for estimating the nonnegative spectra of pure components from mixtures using a linear programming approach (17). The method was not limited to mixtures of two or three components, and an example was given showing the estimated spectra of the pure components in four-component mixtures. A unique solution was obtained by maximizing the dissimilarity between the spectra of the individual components. Other approaches to curve resolution of chromatographic data from multichannel detectors have been reported. Harris et al. used an exponentially modified Gaussian function as an elution profile (18). This function was fit to the eigenvectors obtained from principal component analysis of GC/MS data and, later, LC/UV data (18, 19). A simplex algorithm was used to adjust initial estimates of three parameters per component. These included the retention time, Gaussian bandwidth, and tailing parameters. Results showing the resolution of up to seven overlapped components were shown. Delaney and Mauro reported a spectral library search method to obtain estimates of the “pure spectra” of overlapped peaks (20). These estimates were then used to perform curve resolution of the overlapped peaks. Simulated gas chromatography/infrared (GC/IR) data were used in this study. The author of this paper recently described an application of factor analysis to detect the presence of overlapped LC peaks using simulated data (7). In that paper, a new, iterative method of target testing was reported, which was used to produce estimates of the elution profiles of overlapped peaks and the spectra of the underlying components. The elution profiles were not forced to fit any predefined function. The technique was not limited to binary systems, and examples were given showing‘theresolution of three and four overlapped components. Examples were also given to demonstrate the effects of peak separation and random noise on the accuracy of the results. Vandeginste et al. later reported the identical method using Varimax factor rotation to obtain the initial estimates of the target test vectors (21). Since the initial report, several difficulties with the iterative TTFA curve resolution method have been identified in this laboratory. In the process of addressing these difficulties, the new method of target transformation factor analysis with automatically generated linear inequality constraints (TTFA-LIC) was developed. These difficulties, their solution, and the application of TTFA-LIC are subjects of the remainder of this paper. Nearly all previously published curve resolution papers relied heavily on simulated data. Likewise, simulated data is used in this paper to test the effects of noise, peak sepa-

C 1986 American Chemical Society 0003-2700/86/0358-2656$01,50/0

ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986

2657

where b is the cell path length, Ci,k is concentration of component k during scan i, and ek is the molar absorptivity of component k a t wavelength j . In matrix form, eq 1becomes

[AI = b[CI[El

(2)

where [A] is the i X j matrix of absorbance data a t i scans and j wavelengths for n components, [C] is the i X n concentration matrix, and [E] is the n X j pure absorptivity matrix. Each column of [C] represents the concentration profile of one component, while the corresponding row of [E] represents the spectrum of that pure component. It is the estimation of [C] and [E] that is the goal of curve resolution. In the original iterative TTFA method and in TTFA-LIC, the covariance matrix [Z] is constructed from the raw data matrix [A],

[ZI = [AlT[A1

Figure 1. HPLC absorbance matrix, [A], for four-component mixture of benzophenone, toluene, naphthalene, and biphenyl.

ration, and peak skew on the accuracy of the results. In order to avoid the limitations of relying entirely on simulated data however, curve resolution of experimental data measured Lising the LKB Model 2140 photodiode array detector is presented.

THEORY The TTFA-LIC method of curve resolution requires data from multichannel detectors that exhibit a linear, additive response. In order to observe linear response during periods of varying concentration encountered in chromatography, serial measurement of multiple channels must be performed a t a sufficiently high rate so that changes in the concentration of the analytes during the scanning interval are negligible. In certain cases even the very short scan times (ca. 12 ms) employed in photodiode array detectors can lead to the observation of a nonlinear response. For example, a peak that has AU during a rise time of 0.5 AU/s will change by over 6 X the course of one scan, potentially introducing an error which is an order of magnitude greater than the stated noise level AU). of the detector (5 X Throughout the rest of this paper, reference will only be made to the use of photodiode array detectors (PDA) for high-performance liquid chromatography (HPLC). The reader should understand that the method is general and may be used in conjunction with any multichannel detector for chromatography that meets the above requirements. In the case of HPLC PDA, a raw data matrix, [A],,,, is constructed whose rows correspond to the absorption spectra of the HPLC effluent measured at regular intervals in time. Successive rows in [A],,, are acquired over a period of time during which a cluster of peaks is observed to elute. Each column in [A], corresponds to the chromatogram one would obtain using a single-wavelength detector. As an example, a raw data matrix showing the incomplete separation of benzophenone, toluene, napthalene, and biphenyl is plotted in Figure 1. The conditions under which separation was performed are detailed in the Experimental Section. In regions of peak overlap, the rows (spectra) in [A],,, are linear combinations of the spectra of the pure components. The total absorbance due to n components, ALJ, measured at time i and wavelength j is written according to Beer's law

(3)

where [AITis the transpose of [A]. By using the covariance matrix instead of the correlation matrix, the origin and absolute magnitude of the vector space are preserved. The resulting covariance matrix is decomposed by use of principal component analysis to produce the set of n principal eigenvectors, [Q],, and the corresponding eigenvalues, A,. The number of significant eigenvectors is deduced by comparing the known noise characteristics of the detector with the estimate of the root mean square (RMS) error in the raw data obtained during principal component analysis (13). The RMS error, or real error (RE) is estimated according to eq 4 (12), X

RE = ( C

j=n+l

X j / [ y ( ~- n)])'I2

(4)

where z is the number of rows or columns in [A], whichever is smaller, y is number of rows or columns, whichever is larger, n is the number of principal components, c is the number of columns in [A], and X j are the eigenvalues of [A]. An abstract solution is formulated from the eigenvectors according to eq 5-7 where [EIsbtris the set of abstract spectra. = [Clabst[Elabstr

(5)

[Elab~tr= [Qllf

(6)

[Clabstr = [AI[&],

(7)

The columns of [CIabstrin eq 7 are the mutually orthogonal, unnormalized abstract concentration vectors. As an example, Figure 2 shows a plot of the abstract concentration vectors obtained from the principal component analysis of a twocomponent mixture containing acetophenone and nitrobenzene. The remaining problem is to find a transform matrix, [TI, which rotates the abstract concentration vectors to estimates of the true concentration vectors of the pure components. To that end, the predicted concentration matrix, [Clpred, is calculated according to eq 8. The predicted spectrum matrix, [EIpred,may be easily found according to eq 9 once [TI is determined. In the target transformation method a test

[clpred = [clabstr[Tl

(8)

vector, Ctest,is chosen which by hypothesis is a model of a suspected elution profile. The remaining problem is to find a tranform vector, Ti,which represents Ckstas a linear combination of [CIabstr (&st Ti

=

[ClabstrTi ~~l-'[cl~bstr~test

(10) (11)

The tranformation vector in eq 11 has been shown to be a

ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986

2658

2.m

1

.a

In other cases, especially in the presence of high levels of noise, it has been found that the above criteria prematurely terminate the iterative process. In these cases, much better results would have been obtained had the iteration been allowed to proceed for more steps. Finally, in certain circumstances the iterative process converged toward the desired results too slowly. Repeated attempts to refine the termination criteria failed to resolve the problem. Instead, a different approach to the problem was taken. Since the process of target transformation has been shown to produce a least-squares result, it became apparent that the techniques of constrained optimization might be used in such a way that automatically generated constraints could be incorporated into the target transformation procedure to directly yield nonnegative elution profiles. The problem of finding the target transformation vector is restated to include constraints in the following manner. Find the solution vector, Ti, which minimizes

n

1.28

u a

r a0

0.88

I

0.48

IIctest

- [ClabstrTill

(15)

subject to the simultaneous system of constraints

[GIT; L H

Time (s)

Flgwe 2. Abstract concentration vectors from the principal component

analysis of overlapped acetophenone and nitrobenzene peaks. least-squares result that minimizes the sum of the squares of the difference in eq 12 (22). In eq 13 the notation, llVll, for llctest

- [ClabstrTill

(12)

some arbitrary vector, V, represents the Euclidian length or the Euclidian norm of that vector. The transformation vector

llVll = (VTV)'I2 =

n (CUi2)'/2

(13)

1=1

is used to calculate the predicted concentration vector, Cpred, Cpred

=

[ClabstrT~

(14)

If the initial hypothesis was a good one, that is C,,, is a good model of the real elution profile, then Cpred Ctest. In the original iterative TI'FA procedure, n test vectors were selected, where n is the number of overlapping peaks, which were then refined in an iterative fashion, each time producing a better estiamte of the underlying elution profile (7). Two criteria were used to terminate the iterative process, one based on the estimated error (lack of fit) in the test vector and the other based on the change in the truncated area between successive iterations. Through extensive use of the iterative TTFA technique, it has been found that termination of the iteration after the proper number of cycles is the single most important factor influencing the accuracy of the results. Unfortunately, it has also been found to be the most difficult step in the process. If the iterative process is allowed to proceed too long, the estimated elution profile often becomes wider than the underlying profile. Thereafter, the predicted vector often develops a secondary maximum that accommodates the profiles of additional underlying components. Termination criteria do not discriminate between the width of successive estimates, since in either case both converge to nonnegative linear combinations of the abstract concentration vectors. In certain cases, the iterative process produced excellent estimates of the elution profiles in a few number of steps (3-5), but the termination criteria allowed the iteration to continue, leading to bad results.

(16)

where [GI is an m X n matrix that contains the coefficients for constraints and H is an m vector. Each row of [GI (constraint) defines a boundary (hyperplane) that bisects the solution space, T. In half of the solution space feasible solutions exist, while no feasible solution exists in the other. For the j t h constraint, any solution point, Ti, that lies in the feasible half space has GjTi - h, > 0. The algorithm for performing least-squares analysis with linear inequality constraints published in the monograph of Lawson and Hanson was adapted for use in the target transformation process (23). The algorithm is numerically stable and has been shown to converge to a unique solution in a finite number of steps. Lawson and Hanson's algorithm offers the added advantage that no assumptions need to be made regarding the rank of [GI or the row or column dimensions of [GI. The algorithm also identifies systems of constraints that are inconsistent, that is, no feasible solution exists. Readers desiring a detailed theoretical treatment and proof of the algorithm are referred to the original work. Application of TTFA-LIC to Curve Resolution. In order to obtain nonnegative estimates of the elution profiles of the underlying components, TTFA-LIC is used to implement the physically meaningful constraint Cpred

0

(17)

A simple substitution from eq 14 allows eq 17 to be rewritten as [ClabstrTi

0

(18)

The problem is now restated in a form suitable for TI'FA-LIC. Find the solution vector, Ti, that minimizes IIctest

- [ClabstrTill

(19)

subject to the constraint [ClabstTi

0

(20)

The uniqueness test is performed as in the iterative TTFA method of curve resolution to locate the retention times of the n underlying elution profiles and select n test vectors. The test vectors that correspond to the n components are then used in the TTFA-LIC method to calculate n constrained solution vectors, T,. Each constrained solution vector is used to form one column in the transformation matrix, [TI, thus solving the curve resolution problem. For the sake of efficiency, the predicted vector from the unconstrained TTFA result is examined to identify regions

ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986

where constraints are necessary to prevent negative concentrations. Instead of substituting the entire [C],, matrix for [GI, one constraint is selected for each index in the predicted concentration vector at or past the boundaries marked by the first negative point as one moves left or right from the peak maximum. The corresponding row from [CIabtris copied to the constraint matrix, [GI. Since the constraints are derived from experimental data, they contain measurement error which adversely affects the accuracy of the constrained results. Dramatic improvements were obtained when steps were taken to compensate for experimental error in the constraints. The presence of error in the constraints leads to uncertainty in the exact location of the hyperplanes; consequently, the constraints can be considered to have a finite “thickness”, where the thickness corresponds to the uncertainty in their locations. Constraints that are free of experimental error are infinitely thin hyperplanes. The “thickness” of the experimentally derived constraints has the effect of unnecessarily “shrinking” the hyperpolyhedron of feasible solutions. In order to compensate for this effect, the constraints are “relaxed” by a constant, 6i, according to eq 21 where 6i is a function of the error in the constraints, REP. This has the effect of allowing slight [ClabetrTi

-

6i = w~REP

(21)

REP = RE(TDi)lJ2

(23)

In eq 23, the value of the constrained solution vector, T i , is unknown. For the purposes of estimating REP, the solution vedor obtained from the unconstrained ?TFA method is used as an estimate of the constrained solution vedor and has been found to give acceptable results. Even through application of the above constraints, acceptable results are not always obtained. Secondary maxima in the constrained elution profiles sometimes appear a t retention times that coincide with the retention time of other components. A method has been developed to automatically generate constraints that suppress secondary maxima. Consider the point, cprdj,in the predicted concentration vector, Cpd,where the concentration of eluate of interest reaches its maximum value. The set of inequalities described in eq 24a,b can be written to force elution profile to be monotonic increasing before the maximum and monotonic decreasing after the peak maximum, respectively, where k is the position of

- Cpredj+l 3 0 for all j 2 k Cpredj - Cpred j-1 3 0 for all j < k

Cpredj

(244 (24b)

the maximum. The inequalities in eq 24a and 24b can be rewritten using the relationship in eq 25, where Cabstrjis the j t h row of [C]abstr, giving the set of inequality constraints to Cpredj

= CabstrjTi

0.40

0.30

4J

2 n

8

2 0.28 al

.“ d I

d 0 . 10

0.

ea,

,

(22)

negative deflections in the elution profiles in regions where the concentration of the respective component is very nearly zero. The weighting factor, w i , in eq 22 allows slight adjustment in the thickness of the constraints. It does not seem to be a critical parameter, and values between 1/21/2 and 4 have been tried and found to give acceptable results. A weighting factor of 1.0 is currently being used for the nonnegative constraints described by eq 20. The error in the constraints is estimated from the real error in the predicted target, REP, according to Malinowski’s procedure (24)

(25)

suppress secondary maxima shown in eq 26a,b. A new row (Cabstrj

- C a b s t r j + J T i 3 0 for all j 3 k

(26a)

(Cabstrj

- C a b s t r j - J T i 3 0 for d l j < k

(26b)

2659

Time (s) Figure 3. Predicted concentration vector for benzophenone from overlapped benzophenone, toluene, and nitrobenzene peaks without constraints to suppress secondary maxima.

is appended to the original nonnegative constraint matrix, [GI, for each constraint in eq 26a and 26b. In the interest of computational efficiency, the predicted concentration vector from the unconstrained TTFA method is searched by moving left and right from the peak maximum to locate regions where the magnitude of the points begin to increase toward a secondary maximum. Constraints are automatically generated at these boundaries and for all successive or preceding points, depending on which side of the peak is being processed, by T o compensate subtracting the appropriate rows of [c]&8tr. for the effect of random error in the constraints, the constant 6iis calculated according to eq 22 using a weighting factor wi = 21J2and is used to replace 0 on the right hand side of each of the j constraints (Cabstrj

- Cabstrj+l)Ti

-ai

(27)

Figure 3 shows a plot of the predicted concentration vector of the first component of a three-component mixture using the TTFA-LIC method without constraints to suppress secondary maxima. The three components are benzophenone, toluene, and naphthalene. For comparison, Figure 6 shows the results obtained after the application of the constraints to suppress secondary maxima. By combining the nonnegative constraints and the constraints to suppress secondary maxima into one constraint matrix, a satisfactory solution vedor may be calculated in one TTFA-LIC step. For an n-component mixture, n constraint matrices must be generated, one for each of the n test vectors. Combination of the resulting n solution vectors produces the n X n transformation matrix, [ T I , which is used to calculate the predicted concentration matrix and spectra according to eq 8 and 9. The height of the elution profiles calculated in eq 8 is arbitrary. The appropriate scaling factor, Si, for the ith elution profile is calculated according to eq 28, where Epred,i is the

predicted spectrum of the ith component and c is number of

2880

ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986

Table I. Effect of Peak Separation on Accuracy Using a Mixture of Adenylic and Guanylic Acid At, s 5.12 3.36 2.72 2.08 1.44

R, 0.64 0.42 0.34 0.26 0.18

adenylic peak areaa guanylic peak area" actual found % error actual found % error 2.690 2.690 2.690 2.690 2.690

2.690 2.689 2.684 2.679 2.638

0.002 0.03 0.2 0.4 1.8

2.807 2.809 2.809 2.809 2.809

2.807 2.810 2.814 2.819 2.861

0.002 0.02 0.2 0.3 1.8

= 4.0 S.

Table 111. Effect of Peak Skew on Accuracy Using a Mixture of Adenylic and Guanylic Acid skew wb/wf

R,

1.2 1.3 1.45

0.43 0.39 0.32

adenylic peak area

guanylic peak area

%

%

actual found error actual found error 2.38 2.67 3.13

2.37 2.62 2.89

0.4 1.9 7.7

2.48 2.79 3.27

2.49 2.84 3.51

0.3 1.8 7.4

Table IV. Effect of Random Noise on Accuracy of Results

a~ 1 1 2

Table 11. Results for Three- and Four-Component Mixtures peak area" found

component

actual

adenylic guanylic uridylic

2.690 2.809 1.920

2.597 2.965 1.857

3.4 5.6 3.3

adenylic guanylic uridylic cytidylic

2.690 2.809 1.920 2.859

2.593 2.942 2.044 2.698

3.6 4.7 6.2 5.6

SIN 1OOO:l 1OO:l 20:l

90 error

(29)

In this paper, the integrated peak area of the ith component is approximated by use of eq 30, where r is the number of rows in the original data matrix.

area, = S I C C ' p r e d , i

2.709 2.772 3.488

0.7 3.0 30.0

2.809 2.809 2.809

2.800 2.831 2.515

0.3 0.8 14.0

= 3.0, wlIz = 4.0 s, and At = 3.52 s.

columns (wavelengths) in the original data matrix. The n scaling factors are then used to adjust the corresponding elution profiles to their proper height,

= SiCpred,i

2.690 2.690 2.690

"R. = 0.42; U J ? , ~= 4.0 S: A t = 3.36 S.

" R , = 0.42; w i p = 4.0 S; At = 3.36 S.

C'pred,i

adenylic peak area" guanylic peak area" actual found % error actual found % error

(30)

i=l

EXPERIMENTAL SECTION Several programs were written using Microsoft FORTRAN on an IBM PC to develop and test the TTFA-LIC curve resolution method. These included programs to generate simulated data, select a raw data matrix from binary data files, perform principal component analysis of the raw data matrix, and perform TTFA-LIC curve resolution. Simulated chromatographic data were generated by using the UV spectra of adenylic, guanylic, cytidylic, and uridylic acid over a wavelength range of 200-290 nm at intervals of 10 nm (25). The simulated data in Table I were generated by use of overlapped Gaussian elution profiles of adenylic and guanylic acid to test the effect of peak separation on the accuracy of the results. The retention times of Gaussian elution profiles were adjusted to achieve various degrees of peak separation ( A t = 5.12-1.44 s) and chromatographic resolution. The width of the Gaussian elution profiles a t half-height was wl/p = 4.0 s. Normally distributed random numbers were scaled to simulate various signal-to-noise ratios in Table 11. The random numbers were added to the spectra generated from two overlapped Gaussian elution profiles (urIjz = 4.0 s, At = 2.72 s) of adenylic and guanylic acid. The signal-to-noise ratio was adjusted relative to the maximum absorbance datum in the peak cluster; consequently, the signal-to-noise ratios a t all other points in the peak cluster are much worse. Skewed elution profiles for two overlapped peaks of adenylic and guanylic acid were simulated by use of the generalized exponential function of Vaidya and Hester (26). The data used in Table I11 were generated by using values of 2.5, 2.0, and 1.5 for the variable A to give a skew of 1.2, 1.3, and 1.45, respectively. The skew was estimated from plots of the resulting curves using the ratio of the back to front peak half-widths at 10% of peak height. Other parameters for generating skewed peaks were B

Simulated data sets for three and four overlapping peaks used in Table IV were produced by using Gaussian elution profiles (w112 = 4.0 s, At = 3.36 s) of adenylic, guanylic, uridylic, and cytidylic acid. All experimental data were obtained with an HPLC system consisting of a single-piston reciprocating pump (LDC Milton Roy Model 3961, a manual 5-rL loop injector (Rheodyne 725), a 25 cm X 4.6 mm column packed with 5-rm bonded C,, phase (Phenomenex), and an LKB Model 2140 photodiode array detector. Digitized LC/ W data from the photodiode array detector were recorded using the LKB "Wavescan" software package on an IBM PC/XT microcomputer. Experimental data for the two component studies were obtained by using mixed standards of acetophenone and nitrobenzene (Fisher purified) is methanol (Fisher HPLC grade). The amount of peak overlap was adjusted in the two-component study by using a mobile phase of 90:10, 95:5, or 97:3 methanol/water (Fisher HPLC grade). Experimental data for the three-component study were obtained by using mixed standards of benzophenone, toluene, and naphthalene (Fisher purified) in methanol. Mixed standards of benzophenone, toluene, naphthalene, and biphenyl (Fisher purified) in methanol were used for the four-component study. Methanol (100%)was used as the mobile phase.

RESULTS AND DISCUSSION Table I summarizes the results obtained in the two-component simulated study. The integrated peak areas were calculated using eq 30. For comparison, the actual peak areas were calculated from the original, simulated elution profiles. Excellent agreement between the actual and predicted elution profiles was observed. A slight improvement was observed as peak separation increased. T h e results of the simulated studies demonstrating the curve resolution of three and four overlapping peaks are reported in Table 11. Quantitative agreement between the actual and the predicted peak areas was observed. Table 111 summarizes the results obtained using skewed elution profiles. In general, a greater amount of peak tailing leads to poorer estimates of the elution profiles. In the case of the two-component studies using skewed peaks, careful comparison of the predicted and actual elution profiles has shown that the tail of the first peak is consistently too long. The estimated peak heights are more accurate than peak areas in this case, since they are less sensitive to errors in the peak widths. Further refinements in the estimated elution profiles of skewed peaks might be obtained by using the Varimax factor rotation procedure reported by Vandeginste (21) or Gaussian or skewed Gaussian elution profiles as the initial test vectors instead of the uniqueness test.

ANALYTICAL CHEMISTRY, VOL. 58, NO. 13,NOVEMBER 1986

2661

9.m

P

e.m

7.88

6.88

8 a

8 B4

6.88

0

.-c

4.88

2

3.88

-

4

2.88

i .m

e.m, 288

286

210

215

228

6

226

Time (s)

Time (s)

B

Time (s)

Time (s) Flgure 4. Constrained curve resolution results for overlapped acetophenone and nitrobenzene peaks at R , = 0.54: (A) 0.l l l mg/mL acetophenone 0.112 mg/mL nitrobenzene and (B) 0.011 mg/mL acetophenone 0.112 mg/mL nitrobenzene.

Figure 5. Constrained curve resolution results for overlapped acetophenone and nitrobenzene peaks at R , = 0.24: (A) 0.111 mg/mL acetophenone 0.112 mg/mL nitrobenzene and (B) 0.01 1 mg/mL acetophenone 0.112 mg/mL nitrobenzene.

Table IV summarizes the results obtained using random numbers to simulate experimental error. The constraints were adjusted to account for experimental error according to eq 21 and 27. At a signal-to-noise ratio of 1OOO:l and 100:1,the predicted elution profiles and estimated spectra faithfully reproduce the original curves. Poor performance was obtained at a signal-to-noise ratio of 201. Further improvements might be obtained by eliminating those constraints that contain a high degree of uncertainty relative to the magnitude of the signal they represent according to the method reported by Meister (17). The results for the two-component study using experimental data are shown in Table V. The relative ratio of acetophenone

to nitrobenzene was varied to study the performance of the curve resolution algorithm under a variety of conditions. Plots of the estimated elution profiles at R, = 0.54 and R, = 0.24 are shown in Figure 4a,b and Figure 5a,b, respectively. Only the two extremes (acetophenone at maximum and minimum concentration) are shown in the plots. Peak heights are reported in Table V instead of peak areas because of the previously discussed problems with skewed peaks. For comparison, single-component standards were run under identical chromatographic conditions. Chromatographic resolution was estimated from the single-component standards. Uncorrected peaks heights are reported wherever separate shoulders are present in the raw data. The accuracy of the estimated elution

+ +

+ +

2662

ANALYTICAL CHEMISTRY, VOL. 58, NO. 13, NOVEMBER 1986

Table V. Results Using Experimental Data from Two-Component Mixtures of Acetophenone (acphn) and Nitrobenzene (nitrb) concn, mg/mL acphn nitrb

uncorrected peak height acphn nitrb

curve resolution peak height acphn nitrb

pure stds peak height acphn nitrb

Mobile Phase = 9010 methanol/water, R, = 0.84 0.111 0.055 0.022 0.011

0.112 0.112 0.112 0.112

3.94 1.95 0.81 0.41

0.111 0.055 0.022 0.011

0.112 0.112 0.112 0.112

4.91 2.61 a a

4.90 4.54 4.41 4.33

3.96 1.97 0.82 0.41

4.20 3.52 4.15 4.15

3.73 1.87 0.77 0.38

4.15 4.15 4.15 4.15

4.55 2.41 0.98 0.49

5.39 5.39 5.39 5.39

4.44 2.45 0.98 0.50

5.56 5.56 5.56 5.56

Mobile Phase = 95:5 methanol/water, R, = 0.54 7.12 6.20 5.76 5.65

4.87 2.58 1.09 0.59

6.27 5.64 5.46 5.56

Mobile Phase = 97:3 methanol/water, R, = 0.24 0.111 0.055 0.022 0.011

0.112 0.112 0.112 0.112

a a a a

8.75 7.13 6.10 5.85

5.00 2.56 1.34 0.85

6.99 6.30 5.18 5.23

Separate acetophenone shoulder not present in raw data. 0.68

0.68

0.40

8.48

u

P) 0

m 0

0.38

0.33

n

n

0

0

B4

B 4

al

-d

0

d ."

.I

4

m

&

0.28

0

8.28

0. 10

0 . 10

8.88

O.m 5

Time ( 5 ) Figure 6. Constrained curve resolution results for overlapped benzo-

phenone, toluene, and naphthalene peaks. profiles was observed to decrease as chromatographic resolution became worse. Accuracy also decreased as the ratio of acetophenone to nitrobenzene decreased. Figures 6 and 7 show plots demonstrating the curve resolution of three and four overlapped components. The constraints to suppress secondary maxima were relaxed according to eq 27; consequently, slight shoulders appear in the results. The curve resolution algorithm reported here has been observed to fail under several circumstances. Refinements in addition to the ones mentioned above will be necessary before it can be used as a general purpose tool. Specifically, imposing positive constraints in the concentration domain does not guarantee prediction of positive absorption spectra, and in one case, a negative absorption spectrum was obtained. A linear programming method is needed that will alow con-

Time (s)

Figure 7. Constrained curve resolution results for overlapped benzo-

phenone, toluene, naphthalene, and biphenyl peaks. straints in both the concentration and spectral domain. Negative points in the raw data matrix adversely affect the results; therefore, proper background correction of the raw data is essential. The effect of base-line drift has not been studied; however, in view of the previous observation, it is expected to present difficult problems. Finally, no attempts have been made to study the effect of similarity between the UV spectra of overlapped components on the accuracy of the results. Further work will be necessary to assess its effect. If these difficulties can be overcome through persistent refinements, a robust curve resolution technique may eventually be realized.

ACKNOWLEDGMENT The efforts of undergraduate students William Stonestreet and Donald Westbrook are acknowledged for their assistance

Anal. Chem. 1986, 58, 2663-2668

in obtaining experimental data.

LITERATURE CITED (1) W i n g s , J. C.; Davis, J. M. Anal. Chem. 1985, 5 7 , 2166-2177. (2) Guiochon, G.; Herman, D. P.; Gonnord, M. F. Anal. Chem. 1984, 56, 995-1 003. (3) Lawton, W. H.; Sylvester, E. A. Technometrics 1971, 13, 617-633. (4) Macnaughtan, D.; Rogers, L. 8.; Wernlmont, G. Anal. Chem. 1972, 4 4 , 1421-1427. (5) Kowalski, B. R.; Sharaf, M. A. Anal. Chem. 1982, 5 4 , 1291-1296. ( 6 ) Kowalski, B. R.; Osten, D. W. Anal. Chem. 1984, 56, 991-995. (7) Gemperline, P. J. J . Chem. I n f . Comput. Sci. 1984, 2 4 , 206-212. (8) Knorr, F. J.; Futrell, J. H. Anal. Chem. 1979, 5 1 , 1236-1241. (9) Halket, J. M. A&. Mass Spectrom. 1980, 8 , 1554-1563. (10) Tway, P. C.; Cline-Love, L. J. Anal. Chim. Acta 1980, 777, 45-52. (11) McCue, M.; Malinowski, E. R. Appi. Spectrosc. 1983, 3 7 , 463-469. (12) Malinowski, E. R. Anal. Chem. 1977, 49, 606-612. (13) Malinowski, E. R. Anal. Chem. 1977, 49, 612-617. (14) Kowalski, B. R.; Borgen, 0. S. Anal. Chim. Acta 1985, 774, 1-26. (15) Chen, J.-H.; Hwang, L.-P. Anal. Chim. Acta 1981, 133, 271-281. (18) Vandeglnste, B.; Essers, R.; Bosman, T.; Reijnen, J.; Kateman, G. Anal. Chem. 1985, 5 7 , 971-985.

2663

(17) Mister, A. Anal. Chim. Acta 1984, 161, 149-161. (18) Knorr, F. J.; Thorsheim, H. R.; Harris, J. M. Anal. Chem. 1981, 53, 821-825. (19) Harris, J. M.; McConnell, M. L.; Frans, S. D. Anal. Chem. 1985, 5 7 , 1552-1559. (20) Delaney, M. F.; Mauro, D. M.; Anal. Chim. Acta 1985, 172, 193-205. (21) Vandeginste, B.; Derks, W.; Kateman, G. Anal. Chim. Acta 1986, 173, 253-264. (22) Malinowski, E. R.; Howery, D. 0. Factor Analysis in Chemistry; Wiley: New York, 1980; pp 50-52. (23) Lawson, C. L.; Hanson, R. J. Solving Least-Squares Problems; Prentice-Hall: Englewood Cliffs, NJ, 1974; Chapter 23. (24) Malinowski, E. R. Anal. Chim. Acta 1981, 133, 99-101. (25) Kalivas, J. H. Anal. Chem. 1983, 55, 565-567. (26) Vaidya, R. A.; Hester, R. D. J . Chromatogr. 1984, 287, 231-244.

RECEIVED for review April 29, 1986. Accepted July 1, 1986. Acknowledgment is made to the donors of the Petroleum Research Fund, administered by the American Chemical Society, and to the Burroughs Wellcome Co. for partial support of this research.

Evaluation of Methods for Measuring Gas-Solid Chromatographic Retention on Skewed Peaks J. R. Conder,* Sonia McHale, and M. A. Jones Department of Chemical Engineering, Univerisity College of Swansea, Singleton Park, Swansea SA2 8PP, Great Britain

A major problem in using gas-solid chromatography is that peaks are frequently skewed. Several methods of measuring retention on skewed peaks have been described In the ilterature. They are evaluated for the common circumstance when the skewing is due to curvature of the dlstribution isotherm. Retentlons estimated by five methods are compared with the true infinite dilution retentions for five GSC systems over a range of skew ratlos from l/lto 1/5. The results are collated with previous GLC findings. Using the peak maximum alone can give very large errors, up to about 40%, with skewed peaks. Three of the methods almost always give better, and sometimes far better, resuns, but their advantage Is less consistently marked In GSC than GLC. Overall, the method of Littiewood, Phillips, and Price is the most successful though It may be equaled or bettered by one of the other methods in particular cir'cumstances.

Gas-solid chromatography (GSC) has recently been returning to favor because it commonly offers a wider range of selectivity of related molecular types than does gas-liquid chromatography (GLC) ( I ) . The major, and long-standing, problem with GSC is that peaks are frequently asymmetrical, with a concentration-dependent retention, even at the limits of detector sensitivity. Besides reducing efficiency, this can seriously interfere with peak identification, which depends on accurate determination of the retention volume. I t also creates complications in applications to physicochemical measurement. Of the great variety of physicochemical parameters which can be measured by GC, the majority depend on obtaining the retention of an essentially symmetrical Gaussian peak (2). There are many causes of peak distortion (3). In GSC the most common is skewing due to curvature of the adsorption

isotherm of the adsorbate (solute sample). In this case the time to the peak maximum varies with sample size and fails to provide a measure of the retention a t zero coverage of adsorbate (usually termed infinite dilution by analogy with gas-liquid chromatography (GLC)). The diagnosis of isotherm skewing as the cause of peak distortion and methods of dealing with it have been described elsewhere (3). Unfortunately, it is not always feasible to move to a linear regime by raising or lowering the concentration or changing the temperature. If one must accept that the peaks are skewed, the problem becomes how to use them to give a good estimate of the retention time, tR, at infinite solute dilution or zero coverage of the adsorbate. Three approaches are available. First, purely mathematical expressions for the peak profile may be fitted to the peak. This approach (reviewed, e.g., in ref 3-6) usually takes little account of the physical origin of peak distortion; some peaks are predominantly skewed, due to isotherm curvature, while others, more rarely, are tailed, due to the kinetics of adsorption and desorption (7). The exponentially modified Gaussian model (see, e.g., ref 8) is appropriate only to tailed peaks. Other models, though less restricted, do not readily allow the infinite dilution tR to be obtained from the skewed profile. The second approach is to use a physically based mathematical model. This approach is difficult because of the mathematical complexity inherent in modeling nonlinear, nonideal chromatography satisfactorily in the presence of a large number of influences on the peak shape. (We are not concerned with the simpler, extreme case of nonlinear, ideal chromatography where the peak has an infinite skew with one side vertical (9, IO).) Many attempts at modeling have been made (6, 11-16), three of which lead to the possibility of estimating tR from skewed peak measurements (6,12,14). Of these the recent model of Jaulmes et al. (6) is the most developed. Though the model takes into account more peak

0003-2700/86/0358-2663$01.50/0 0 1986 American Chemical Society