Statistical method for estimation of number of components from single

Statistical Method for Estimation of Number ofComponents from Single Complex Chromatograms: Theory,. Computer-Based Testing,and Analysis of Errors...
0 downloads 0 Views 1MB Size
2168

Anal. Chem. 1985, 57,2168-2177

Statistical Method for Estimation of Number of Components from Single Complex Chromatograms: Theory, Computer-Based Testing, and Analysis of Errors Joe M. Davis and J. Calvin Giddings*

Department of Chemistry, University of Utah, Salt Lake City, Utah 84112

A procedure based on the statistical model of overlap Is introduced by which the number m of detectable components (or slngie-component peaks) In a slngie chromatogram having substantlal overlap can be approximated. I n essence, It Is shown that the dlstrlbution of the Intervals or spaelngs between adlacent chromatographlc maxima provldes the Information necessary to estlmate m . The procedure Is tested using computergenerated chromatograms In whlch the density, asymmetry, and amplltude dlstributlon of the slngie-component peaks are widely varied. An analysis of the error in component numbers estimated uslng the procedure is presented. I t Is shown that the attrlbutes of slngle-component peaks clted above independently influence the m values calculated from theory. Llmltatlons on the procedure’s use are discussed.

by its application to computer-generated chromatograms containing Gaussian and exponentially convoluted Gaussian SCPs. The m values so estimated are generally good but are somewhat subject to systematic errors. In the second part of the paper, a rather rigorous analysis of the origins of these errors is presented. It is shown using analysis of variance that select attributes of the SCPs significantly influence the m values calculated from theory and that the relative error in m is largely independent of m for reasonable m values. A working range for the procedure is established and some practical limitations are noted. In a followingpaper, we apply the procedure developed here to estimate the numbers of components in three experimental chromatograms. Evidence that the estimates are reasonable is provided by criteria developed in this paper.

THEORY The ubiquitous overlap of a substantial fraction of singlecomponent peaks (SCPs) in chromatograms produced from many-component samples has plagued analysts since the beginning of chromatography. Notwithstanding the steady advance in column technology and instrumental design, many complex mixtures still cannot be resolved completely. One approach to this problem is to separate partially such mixtures with highly efficient columns and then to use various data reduction or statistical methods to recover some of the information in the chromatogram lost by overlap. Statistical moments, for example, have been used as one criterion of overlap (1). Factor analysis has also been used by many groups over the last decade or so to determine the number of components in an eluting peak (2-6). More recently, simple models have been introduced which estimate the degree of overlap from an assumed statistical distribution of SCPs with respect to the retention-time or -volume axis (7, 8). The authors recently developed a statistical model of overlap (SMO) based on the assumption that the retention times or volumes of SCPs in complex chromatograms are distributed randomly according to a Poisson process (8). (By complex chromatogram, we mean a chromatogram of many components with a substantial overlap of the SCPs.) In the development of this model, the retention times (or volumes) of SCPs were identified with points distributed on a retention-time (or -volume) axis, and it was further assumed that each SCP was successfully separated or identified if there existed some minimum spacing x o (subject to arbitrary assignment) between it and its closest neighbors. The analysis of both computer-generated and experimental chromatograms has demonstrated that the SMO is broadly applicable (9-11). In the first part of this paper, we introduce a procedure based on the SMO which makes it possible to estimate the number rn of SCPs (that is, the number of detectable components) from a single complex chromatogram, namely, from the distribution of the retention times of the observed chromatographic maxima. This procedure is then characterized 0003-2700/85/0357-2 168$01.50/0

The expected number p of “peaks” in a chromatogram containing m randomly distributed SCPs is related to the average (or expectation) number rii of SCPs in the chromatogram and to the column’s peak capacity n, by (8) p = ae-filnc = fie-“ (1) In this equation, the saturation factor CY = m/n, is a measure of the average saturation of the separation coordinate by overlapping SCPs, and the peak capacity n, is the theoretical maximum number of SCPs which can be resolved to the specified level x o in the space (time, volume, or distance) X of interest, as measured along the separation coordinate

In eq 2, u is the average standard deviation of adjacent SCPs and R8*is the critical resolution (subject to arbitrary assignment) beyond which adjacent SCPs are resolved to satisfaction into different “peaks”. In the context of the SMO, a “peak” is a cluster of one or more SCPs in which (a) the separation between the maxima (or centers of gravity) of adjacent SCPs in the cluster is less than xo and (b) the separation between all maxima in the cluster and all maxima not in the cluster is greater than x o (8). A “peak is not, therefore, necessarily identified with the response from a single chemical species (an SCP) or even with a chromatographic maximum. Our goal is to estimate the number m of detectable components represented in a chromatogram using eq 1, which depends on the statistical quantity rii. Thus we must use rii as an approximation to m. With large m’s, the average fractional difference between rn and rii becomes small, and the use of m = has increasing statistical validity. Equation 1 can be linearized by taking logarithms, giving the form In p = In r?i - m / n , (3) The analysis of computer-simulated and experimental chro0 1985 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 57, NO. 12, OCTOBER 1985

2169

(such as time, elution volume, or length); we shall henceforth adopt units of time. The expected number of spacings greater than x,,’ is this probability multiplied by the expected number of spacings, which can be taken as the expected number m of components if end effects are neglected. Thus the expected number p’ of spacings greater than x,,’ is approximately

Spacings a, b, and c > x; ;

Quantity p‘ also equals the expected number of “peaks” in the line chromatogram when the minimum spacing which is considered necessary to resolve adjacent SCPs into “peaks” equals the arbitrary spacing x d ; n,‘ = X/x,,’ is an effective “peak capacity (see eq 2) based on the x,,’ spacing. Similarly, Ax,,’ = a’defines an effective saturation factor based on the value n,‘. Thus we have a series of primed parameters (p’, n;, and a’)that relate to the arbitrary spacing xo) in the same way that p , n,, and a depend on the minimum spacing xo needed for suitable SCP resolution. Since x d is arbitrarily chosen, a set of data points (x,,’/X, ln p’j, or equivalently (l/n,’, In p ? , can be derived from a single line chromatogram. Here, p’ is the observed number of spacings between adjacent SCPs exceeding xg) or, equivalently, the observed number of “peaks” corresponding to xg). The data set so obtained can then be fit to a relationship fully analogous to eq 3 In p ’ = lzl m - r?i/nl (7)

~ ‘ ( 23 ) =

f

e

or, in terms of xo’ In p ’ = In

I

=-,x

x-

Figure I.Applications of statistical model of overlap (SMO) to single

chromatograms. Parts a, c, and e are, respectlvely, ilne, unsaturated, and saturated chromatograms; the broken curves in parts c and e represent amplitude functions. Parts b, d, and f are the corresponding In p’vs. x i / X plots; the s o l i and broken lines represent the theoretical predictions of the SMO and least-squares fits to eq 8, respectively. In Figure la, the observed “peak”number p’(x,’/X) = 3 since spacings a , b , and c > x i . matograms containing randomly distributed SCPs has verified that (under appropriate conditions) plots of the logarithm of the number p of observed “peaks” in the chromatograms vs. reciprocal peak capacity 1/ n , provide reasonable component-number estimates both from the slope (-m) and intercept (ln m) of eq 3 (9-11). This approach is the basis of our earlier method for estimating component numbers from a series of complex chromatograms (8). We introduce now the cowepts with which the number of randomly distributed SCPs (detectable components) can be estimated from a single chromatogram. (An alternative procedure for this objective was recently proposed by Martin and Guiochon (12).) For purposes of clarity, we assume first that each SCP in the chromatogram is a delta function, thus infinitely narrow and totally resolved from other SCPs (see Figure la). We term this ideal chromatogram a line chromatogram. The spacings S between adjacent SCPs in a line chromatogram are distributed according to the probability function P(S) from Poisson statistics (8)

P(S) =

Xe-XS

(4)

where X = m/X.The probability that the spacing S between adjacent SCPs exceeds any arbitrary value x,,’ is

where quantities S and x,,’ may have any convenient units

r?iXo’

m -X

From the resulting plot of In p’vs. x,,’/X, one can estimate the number of components from either slope or intercept, as shown in Figure Ib. (We recognize that m could be determined directly from a line chromatogram by counting the SCPs; the procedure outlined above for estimating m is needed with real chromatograms in which the finite width and consequent overlap of the SCPs make an exact count of m impossible.) The evaluation of p’ for an arbitrary value of x,,’ is illustrated in Figure l a where we show three gaps (p’= 3), labeled a, b, and c, which are wider than the illustrated value of xg). Clearly, p’ will increase as x,,’ is reduced. The above procedure is now extended to a more realistic chromatogram containing overlapping SCPs which form a complex amplitude function (Figure IC). The spacings S between adjacent SCPs must now be approximated by the spacings between the observed adjacent maxima, that is, by the differences in the retention times of adjacent maxima. Because closely adjacent SCPs overlap and conceal individual SCP maxima, p’is largely insensitive to changes in x,,’ when x,,’ is small. In other words, small gaps between SCPs cannot be seen and counted, and thus at some point the observed count p’ ceases to increase as x,,’ decreases. In general, quantity p’ is (essentially) invariant and approximately equal to the number pm of maxima unless x,,’ is larger than the average m i n i u m spacing xo* required between adjacent SCPs to detect distinct maxima. In unsaturated chromatograms, xo* is typically given by eq 2 with R,* = 0.5 (9-11): xo* = 2a. In relatively unsaturated chromatograms, only closely adjacent SCPs overlap and conceal the gap between them. The retention times of two adjacent, well-resolved maxima are thus not much different from the retention times of the two SCPs which are closest to each other, as shown in Figure 2a. Thus, the spacings S between well-resolved adjacent SCPs, along with the number of such spacings, are only slightly distorted by overlap at low saturations. Therefore, the distribution of gaps and the numbers p’evaluated from line chromatograms and relatively unsaturated chromatograms are similar if we

2170

ANALYTICAL CHEMISTRY, VOL. 57, NO. 12, OCTOBER 19

where

z=&[(+)-(;)]

(11)

For the nth SCP in the above expressions, A , is the relative amplitude, t,, is the retention time of the nth Gaussian SCP, a, is the standard deviation (in units of time t ) ,and T, is the time constant of the exponential function exp(-t/ 7,) which is convoluted with the nth Gaussian SCP. The retention times of the SCPs were distributed randomly such that for the nth SCP trn = b

a

Figure 2. Approximation of adjacent SCP spacing (bottom diagrams) by adjacent maxima spacing (top diagrams). In Figure 2a, S, = S2; in Figure 2b, S, > S4.

consider only the range x,,’ > xo*. The ratio xo*/X can be evaluated directly from a plot of In p’vs. x,,’/X. On the left-hand side of Figure Id, In p’is (essentially) invariant; on the right-hand side, In p ’ varies strongly with x,,’/X. Only data lying to the right of this invariant plateau are fit to eq 8. In contrast, in highly saturated chromatograms the spacings between adjacent maxima are considerably larger than the spacings between adjacent SCPs, as shown in Figures l e and 2b. The numbers p’evaluated from these chromatogramsare consequently larger than the predictions of eq 1. As an example, the arbitrary value x,,’ shown in Figure 2b is greater than the spacing S4 between the indicated SCPs but is less than the spacing S3 between the observed adjacent maxima. Thus, we erroneously count this spacing or gap as part of p ’ for that x,,’. It is difficult to describe mathematically the resultant error. Computer simulations indicate that the counting errors increase with x ( / X and that plots of In p’vs. xo’/X produced from highly saturated chromatograms yield slopes whose values are depressed relative to theory (see Figure If). It is noted that only the retention times of the observed maxima are required in this procedure to determine the observed numbers p ’ for the chromatogram; no data on SCP width are required. These retention times are perhaps most easily obtained from the microprocessor-based data systems ancillary to much present-day GC and LC instrumentation. The (almost) universal ease with which the necessary data are generated is a strong incentive for using the method developed here as opposed to our earlier procedure (9-11). (The details of constructing In p’vs. x ( / X plots from such retention-time data are summarized in a following paper.)

PROCEDURES The computer-generated chromatograms produced to characterize the method detailed above contained either randomly distributed Gaussian or exponentially convoluted Gaussian (ECG) single-componentpeaks of equal width. The amplitude functions h(t)formed from a sum of m of these SCPs are, respectively, the summations (10)

and

tmin

+ (trnax

-tmiJ~n

(12)

where t, and t,, are the maximum and minimum possible retention times and X, is a uniformly distributed random number between zero and one. Technically, the above procedure does not yield a Poisson distribution because m rather than A (and thus m) is fixed. However, the approximation should be very good for large m (9)*

The amplitudes of the SCPs were distributed in one of three ways. The bulk were distributed randomly such that An

= Amin +

(Amax

- Amin)Yn

(13)

where A,, and Ami, are the maximum and minimum possible amplitudes and Y , is a uniformly distributed random number between zero and one which is different from x,. The SCP amplitudes in the remaining chromatograms were either distributed exponentially or held constant; these latter two amplitude distributions have not previously been considered in our testing of the SMO (9, 10). These three amplitude functions were chosen independently by Herman et al. to characterize the SMO (11). The number of SCPs in each simulation was arbitrarily chosen as m = 100, 200, or 300. The standard deviation u, of each SCP was arbitrarily chosen as 4 s; hence, u = 4 s. (This constancy represents the most ideal distribution of SCP widths possible in temperature-programmed GC and solvent-programmed LC.) Quantity X,the time between the first and last SCP in the chromatogram, was adjusted to generate different degrees of saturation CY. The u / r ratios, which measure the SCP asymmetries, were held constant throughout each chromatogram. The approximate retention times of the maxima in each chromatogram were determined by a sequential scanning of the amplitude function h(t)expressed in digital form, h(i), where i ranged from 1 to 15000. The array h(i) was first filtered digitally with a 9-point Savitzky-Golay routine (13) to remove discontinuities in the amplitude function arising from truncating eq 9 and 10. The retention time of each maximum was then calculated by parabolic interpolation among each of the locally largest elements h(i)and the smaller adjacent elements h(i - 1) and h(i + 1). The differences between the retention times of adjacent maxima were then calculated for comparison to arbitrarily chosen values of x,,’ to generate data for plots of ln p‘vs. x,,’/X. No value of p’less than 15 was used to minimize problems associated with the statistics of small numbers and the failure to account for end effects in the model’s development (8). NO values of p’greater than 0.85m, which were values commensurate with the largest numbers of observable maxima in chromatograms containing overlapping SCPs, were used in fitting data from line chromatograms to eq 8. Plots of In p ’ vs. x { / X were produced for visual inspection and for assistance in fitting data to theory. Figure 3 is a schematic of the procedure used in this study for separating the plateau and sloping regions of a In p’vs.

ANALYTICAL CHEMISTRY, VOL. 57, NO. 12, OCTOBER 1985

0=0.!57 a/7=0.35 Pm'!5

0 - 0 3 3 3 %=0.35

0-0.500 7,=0.35

,p = IO

Pm'8

2171

Figure 3. Procedure for separating plateau and sloping reglons of In p'vs, x,'/X plot. Points for which p' C 25 and which do not appear to belong to the linear relationship (e.g., points a and b) may be rejected. Lines A-F are least-squares fits of points 1-5, 1-6, 1-7, etc., to eq 8. Component numbers are evaluated from line C, for which the reduced sum of squared reslduals (RSSR) is still small.

x,'/X plot. These two regions could alternatively have been separated by visual inspection of the plot. The procedure depicted by Figure 3 was adopted to reduce subjective biases. The points forming the sloping region of each lnp'vs. x d / X plot were then fit to eq 8 by least squares (14).The weight assigned to each datum is discussed in the Appendix. The algorithm for these computations and comparisons was written in the FORTRAN language and was executed on a DEC-20 computer operating under TOPS 20 5(4747) monitor. Any graphics required for the confirmation of the algorithm were generated with the plotting routine PLOT 79. CHARACTERIZATION OF COMPUTER-GENERATED CHROMATOGRAMS Table I reports data evaluated from or characterizing 43 sets of computer-generatedchromatograms. The third column of this table identifies the SCP amplitude distribution function (ADF) for the six chromatograms comprising each set as uniformly random (ran), exponential (exp), or constant (con). The a values reported in the sixth column are arbitrarily calculated from eq 2 with RE*= 0.5, which defines the peak capacity n, in terms of observable maxima ( S I I ) . (These a values are based on the fixed numbers of SCPs reported in the second column, instead of on a statistical distribution of m values.) The seventh and eighth columns report the means and standard deviations of the numbers of components (SCPs) estimated from the slope (m,) and intercept (mi) of eq 8. The numbers cited in parentheses are the percentage errors between the mean numbers of SCPs calculated from theory and the true numbers. A brief study of these numbers indicates that the m, and mi values calculated from eq 8 are generally good. For some cases, however, m, and mi clearly differ from m; the origins of these differences are discussed below. Figure 4 is an illustrative series of ten computer-generated chromatograms in which the SCP widths (and thus the saturations of the chromatograms) and asymmetries vary in different subfigures. Subfigures a-d contain Gaussian SCPs characterized by the saturation factors (defined with RE*= 0.5) a = 0.167, 0.333, 0.500, and 0.667. Subfigures e-g and h-j contain ECG single-componentpeaks with respective U / T ratios of 0.35 and 0.20, which thus have substantial asymmetry. We will arbitrarily define the SCP saturation in these last six subfigures in terms of the equivalent Gaussian SCP density. With this convention, the a values characterizing subfigures

0=0.157 7,=0.20 pm=15

u = 0.333 P,

= 0.20

=IO

a = 0.500

Y7 = 0.20

p =7

Figure 4. Ten computer-generated chromatograms containing 17 SCPs and characterized by dlfferent a and U / T ratios. The number p mof maxima is indicated in each subfigure. Amplitude range was 150-fold.

e-g and h-j are 0.167, 0.333, and 0.500. METHODOLOGY OF ERROR ANALYSIS In order to understand better the origins of systematic errors in the m values calculated from theory, select m, and mi values were subject to a series of analyses of variance (ANOVAs) to determine if variation in the estimated component numbers from case to case is statistically significant relative to variation within a case (15).The statistical significance of all variations was judged relative to the Fisher ratios F calculated from the ANOVAs. When the calculated ratio was greater than the critical F ratio for the 95% confidence level, we concluded that the attribute or factor examined by that ANOVA significantly influences component numbers calculated from eq 8. We have used both the oneand two-way Model I ANOVAs with which, respectively, we examined the influences of one and two independent, controllable attributes on the m estimates (15).The interaction Fisher ratio calculated via a two-way ANOVA is a measure of the synergism or interference between the two attributes under study (15). A prerequisite for the accuracy of ANOVA is uniformity among the variances of the compared sets, or homoscedasticity. By use of Barlett's homogeneity test (I5),a reduced x2 statistic (RCSS) was calculated by using variances determined from the m, and mi values of the sets compared by ANOVA. These variances were interpreted as heteroscedastic, thus lessening the rigor of the ANOVA, if the calculated RCSS exceeded the critical RCSS for the 95% confidence level and the appropriate number of degrees of freedom. To characterize more fully the origins of error in the m estimates, Snedecor's procedure (15)was used to calculate approximate one-way ANOVAs based on slightly heteroscedastic data, which otherwise could not have been utilized in the error analysis. (With this procedure, the contribution of each set of m, or mi values to the sums of squares between and among the compared sets is weighted by the reciprocal variance calculated from the m, or mi values.) The numbers

ANALYTICAL CHEMISTRY, VOL. 57, NO. 12, OCTOBER 1985

2172

Table I. Data Characterizing the Computer-Generated Chromatogramsa set A B C

D E F G H I J K L M N 0 P

Q

R S T U

v W x Y 2

A1 B1

c1

D1 El F1

G1 H1

I1 J1

K1 L1 M1 N1 01 P1 Q1

no. of SCPs ( m ) 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 200 100 100 100 100 300 300 300 300

ADF

ran ran

ran ran ran ran ran ran ran ran ran ran ran ran ran ran ran ran ran ran exP con exP con exP con exP con exP con

exP con

exP con

amplitude range 50 150 50 150 50 150 50 150 50 150 50 150 50 150 50 150 50 150 50 150 m

1 m

1 m

1 m

1 m

1 W

1 W

1

U/T

m m m m

m m

m m

0.35 0.35 0.35 0.35 0.35 0.35 0.20 0.20 0.20 0.20 0.20 0.20 m

m

0.20 0.20 m m

0.35 0.35 m W

0.20 0.20 m m

exP

m

m

exP

m

m

exP

m

m

exP exp

W

m

W

m

exP

m

m

cy

(&* = 0.5)

0 0.167 0.167 0.333 0.333 0.500 0.500 0.667 0.667 0.167 0.167 0.333 0.333 0.500 0.500 0.167 0.167 0.333 0.333 0.500 0.500 0.167 0.167 0.167 0.167 0.333 0.333 0.333 0.333 0.500 0.500 0.500 0.500 0.667 0.667 0 0.333 0.500 0.667 0 0.333 0.500 0.667

estimate from slope (m,)

estimate from intercept (mi)

197 f 16 (-1.5) 200 f 14 (0.0) 200 f 17 (0.0) 197 f 18 (-1.5) 191 f 18 (-4.5) 190 f 23 (-5.0) 190 f 29 (-5.0) 179 f 22 (-10.5) 188 f 19 (-6.0) 188 f 15 (-6.0) 192 f 19 (-4.0) 178 f 15 (-11.0) 175 f 13 (-12.5) 155 f 20 (-22.5) 129 f 21 (-35.5) 186 f 12 (-7.0) 188 f 18 (-6.0) 166 f 8 (-17.0) 169 f 14 (-15.5) 138 f 22 (-31.0) 118 f 19 (-41.0) 198 f 14 (-1.0) 187 f 13 (-6.5) 183 f 30 (-8.5) 191 f 15 (-4.5) 199 f 23 (-0.5) 187 f 19 (-6.5) 148 f 18 (-26.0) 179 f 19 (-10.5) 179 f 23 (-10.5) 172 f 29 (-14.0) 119 f 14 (-40.5) 149 f 13 (-25.5) 152 f 7 (-24.0) 202 f 24 (1.0) 99 f 7 (-1.0) 101 f 7 (1.0) 84 f 13 (-16.0) 68 f 16 (-32.0) 297 f 8 (-1.0) 282 f 21 (-6.0) 262 f 13 (-12.7) 230 f 19 (-23.3)

193 f 17 (-3.5) 203 f 16 (1.5) 203 f 19 (1.5) 214 f 24 (7.0) 204 f 19 (2.0) 230 f 43 (15.0) 225 f 42 (12.5) 254 f 58 (27.0) 254 f 41 (27.0) 195 f 16 (-2.5) 201 f 23 (0.5) 206 f 26 (3.0) 200 f 17 (0.0) 202 f 50 (1.0) 158 f 29 (-21.0) 195 f 14 (-2.5) 196 f 21 (-2.0) 189 f 14 (-5.5) 195 f 25 (-2.5) 176 f 49 (-12.0) 149 f 38 (-25.5) 201 f 15 (0.5) 186 f 14 (-7.0) 202 f 55 (1.0) 197 f 16 (-1.5) 226 f 46 (13.0) 194 f 22 (-3.0) 169 f 22 (-15.5) 199 f 27 (-0.5) 213 f 31 (6.5) 183 f 44 (-8.5) 139 f 15 (-30.5) 179 f 22 (-10.5) 191 f 9 (-4.5) 273 f 114 (36.5) 99 f 7 (-1.0) 112 f 8 (12.0) 102 f 18 (2.0) 90 f 24 (-10.0) 298 f 12 (-0.7) 311 f 32 (3.7) 311 f 20 (3.7) 295 f 20 (-1.7)

no. o f maxima @ ), 200.0 f 0.0 160.5 f 2.2 160.2 f 3.0 132.3 f 2.7 133.7 f 5.5 111.5 f 2.7 113.5 f 7.9 93.3 f 4.8 98.2 f 5.9 145.5 f 2.6 148.2 f 2.4 113.8 f 2.7 114.0 f 3.4 90.2 f 4.9 81.5 f 11.4 142.8 f 3.8 143.0 f 2.8 108.0 f 1.8 107.5 f 3.7 83.3 f 3.8 77.0 f 10.8 157.8 f 3.1 162.0 f 2.6 131.5 f 3.6 149.2 f 2.5 127.3 f 3.6 137.8 f 4.3 104.2 f 4.5 120.8 f 5.9 108.7 f 4.2 117.5 f 7.8 88.3 f 8.6 96.7 f 3.7 91.0 f 4.5 92.2 f 2.0 100.0 f 0.0 65.2 f 2.1 53.5 f 1.9 44.3 f 2.0 300.0 f 0.0 196.3 f 6.4 162.7 f 9.4 137.5 f 7.9

'Sets A, J1, and N1 report results derived from or characterizing line chiromatograms. u2 of degrees of freedom for the calculated F ratios are nonintegral for these cases.

DESCRIPTION OF ANOVA PROCEDURES Table I1 reports the results of three two-way (ANOVAs 1-3) and ten one-way (ANOVAs 4-13) analyses of variance computed using m, and mi values determined from chromatograms composing sets A through I1 in Table I. The fourth and fifth columns, respectively, report the Fisher ratios for ul and u2 degrees of freedom calculated from the ANOVAs and the RCSSs x2,-1calculated from Bartlett's homogeneity test on n groups of m, or mi values. The significance of these values was judged, as stated earlier, relative to the critical Fisher ratios F*,,,,2 and the critical RCSSs x ~ * , -for ~ the 95% confidence level (14),values of which are reported at the bottom of the table. The general puraose of the ANOVAs was to determine if select attributes of the SCPs significantly influence the values m, and mi calculated from eq 8. The first ANOVA was computed to evaluate independently the influences of the SCP density and the SCP random amplitude range (RAR) on estimates evaluated from chromatograms containing Gaussian SCPs. The saturation was varied from a = 0.167 to 0.667, in increments of 0.167, for both a 50- and 150-fold range. (Here, a is defined by eq 2 with R,* = 0.5.) ANOVAs 2 and 3 were

calculated to evaluate independently the influences of these same attributes on estimates calculated from chromatograms containing ECG single-component peaks. The saturation was varied over the range a = 0.167-0.500, in increments of 0.167, for both a 50- and 150-fold range. The U / T ratios of the SCPs in these chromatograms were 0.35 (ANOVA 2) and 0.20 (ANOVA 3). The influence of the RAR is partially characterized by ANOVAs 1-3. Other ADFs may affect the m estimates somewhat differently; seven one-way ANOVAs were consequently calculated to examine the influence of four ADFs: a 50-fold RAR, a 150-fold RAR, a constant ADF, and an exponential ADF. One-way ANOVAs were calculated because the significant influence of saturation on m, and mi, which is revealed by ANOVAs 1-3, would (most likely) be reevaluated by a two-way ANOVA with the variates a and ADF. This reevaluation would occur because the two amplitude functions compared by ANOVAs 1-3 would also be compared by the two-way ANOVA. The influence of only the ADF on the estimates was consequently characterized by the simpler one-way ANOVAs. ANOVAs 4-10, respectively, examine the influence of the ADF on estimates calculated from chromatograms containing Gaussian SCPs and characterized by the saturation factors a = 0.167, 0.333, 0.500, and 0.667, ECG single-component

ANALYTICAL CHEMISTRY, VOL. 57,

Table 11. One- and Two-way ANOVAs Calculated from Data in Table I" ANOVA

compared sets variates

1

F

x2n-1

m,: xZ7= 0.48 mi: F*3,40 = 4.83 mi:x27 = 2.00

a

m,: F3,10= 1.30

RAR

4

ADF

5

ADF

6

ADF

m,: Fl,4o = 0.03 mi: F1,40 = 0.14 m,: F3,40 = 0.28 mi: F3,40 = 0.06 m,: F*2,30 = 40.4 mi: F*2,3o = 4.14 m,: F1,30 = 0.92 mi:F1,30 = 0.46 m,: F2,30 = 1.80 mi: F2,30 = 1.13 m,: F*2,30 = 23.7 mi:Fz,30 = 2.07 m,: F1,30 = 1.87 mi: F1,30 = 2.16 m,: F2,,o = 2.56 mi: F2,30= 2.42 m,: F3,20 = 1.09 mi: F3,20 = 1.49 m,: F3.20 = 0.45 mi: F3,20 = 1.35 m,: F3,Zo = 0.60

7

ADF

mi: F3,20 = 1.59 mi: xZ3= 0.22 m,: F*3,2o = 7.32 m,: xZ3= 1.99

8

ADF

9

ADF

mi: F*3,8.8= 6.31 m,: F3,Zo = 0.16 mi: F3,io.S = 0.17 m,: F*3,2o = 4.43

10

ADF

mi: F3,20 = 2.06 mi:xZ3= 2.36 m,: F*3,zo = 4.63 m,: xZ3= 0.24

11

ADF

12

ADF

13

ADF

INT 2

a

RAR

INT 3

a

RAR INT

F*1,3o

= 4.17 3.10

F*3,2o = F*3,10,8

2.22

F*,,l, x2,* =

F*1,40

= 4.08

F*3,40

= 2.85

= 3.59 2.01

x22*

m,: xZ5= 0.32 mi: xZ5= 1.73

m,: xZ5= 0.08 mi: xZ5* = 2.38

m,: xZ3= 0.12 mi:xZ3= 0.18 m,: xZ3= 0.16 mi: xZ3= 1.62 m,: xZ3 = 0.19

mi:xZ3* = 6.45 m,: xZ3= 1.39 mi:xZ3*= 3.93 m,: x23 = 0.59

mi: F*3,2o = 3.11 mi: xZ3= 0.38 m,: F2,16= 0.75 m,: xZ2= 0.66 mi: F2,15 = 0.13 mi:xZ2= 2.50 m,: F*2,15 = 4.31 m,: xZ2= 0.70 mi: mi: m,: F2,15 = 0.10 m,: xZ2= 0.37 m;:F , , , = 0.18 m;: . .x2, - - = 0.55 F'2,1:= 3.68 F*2,30 = 3.32 F*3,8,8 rs F*3,9 3.86 = 3.00 x23* = 2.60 x z ~ *=

"Variates examined are saturation (a),50- and 150-fold random amplitude ranges (RAR), amplitude distribution function (ADF), and interaction (INT). F and x 2 (F* and x2*) are the calculated (and critical) Fisher ratios and reduced x2 statistics. Nonintegral degrees of freedom calculated using Snedecor's procedure are rounded to the nearest whole number to evaluate F*. Significant tests are indicated by asterisks. peaks characterized by (TIT = 0.20 and a = 0.167 and 0.500, and ECG single-component peaks characterized by U / T = 0.35 and a = 0.333.

DISCUSSION The success of the procedure for estimating m depends on a close correspondence between the positions of the observed maxima and the underlying SCPs. A study of Table I suggests that any factor which diminishes the number of observable maxima for a given number of SCPs reduces the reliability of the values m, and mi.Intuitively, one would expect that high levels of SCP saturation, large amounts of SCP asymmetry (tailing), and large SCP amplitude ranges should independently contribute to the loss of maxima and to inaccuracy. This hypothesis is largely confirmed by ANOVAs 1-13.

NO. 12, OCTOBER 1985

2173

The appropriate F ratios for the first three ANOVAs indicate that n,values derived from chromatograms containing ECG single-component peaks (ANOVAs 2-3) are more depressed by increasing levels of saturation than are m, values derived from chromatograms containing Gaussian SCPs (ANOVA 1). ANOVAs 1-2 indicate that the mi values are also affected, although to a less degree, by the changes in saturation. The %fold variation in the range of uniformly distributed SCP amplitudes does not significantly influence the estimated component numbers (ANOVAs 1-3). There is nevertheless an increase in the RAR and INT Fisher ratios in ANOVAs 2-3 relative to their magnitudes in ANOVA 1. The increase in both ratios suggests that the estimates are more sensitive, over the a range examined, to the range of a uniform amplitude distribution when the SCPs have tails (which can obscure SCPs of relatively small amplitude) and also that this sensitivity increases with saturation. ANOVAs 4-6 indicate that the values mi and m, estimated from chromatograms containing Gaussian SCPs are largely independent of the SCP amplitude distribution functions (ADFs) examined if a G 0.5 but are significantly influenced by the ADF when a = 0.667 (ANOVA 7). The ADF does not strongly influence component numbers derived from chromatograms containing exponentially convoluted Gaussian SCPs if a = 0.167 (ANOVA 8), but its influence becomes significant with increasing saturation (ANOVAs 9-10). In general, the error in the values m, and mi is more severe at a fixed level of saturation when the SCPs have tails (cf. ANOVAs 6 and 9; 5 and 10). The observation that the 50- and 150-fold RARs yield comparable results is not very surprising because of the general similarity of the two distributions. More meaningful is the comparison of these two ADFs with the constant ADF. The significant ANOVAs 7,9, and 10 were consequently reevaluated excluding from the analyses component numbers derived from chromatograms containing SCPs distributed exponentially in amplitude, and the results of these reevaluations are reported as ANOVAs 11-13 in Table 11. For all but one analysis, the F ratios are insignificant, suggesting that large variations in uniform RARs have little influence on m, and mi over the a range investigated and that the major source of variation in ANOVAs 7,9, and 10 is the exponential ADF. The latter conclusion is reasonable because most SCPs produced from the exponential distribution are relatively small and are easily obscured by overlap. Furthermore, an a posteriori ANOVA using the sum-of-squaressimultaneous procedure (15) indicates that the significant variation determined by ANOVA 12 arises from the 150-fold RAR, the second largest range of the four ADFs examined. Table I11 reports the means and standard deviations of the percentage errors PE in m, and m ivalues computed from chromatograms containing m = 100,200, and 300 Gaussian SCPs distributed exponentially in amplitude and characterized by the saturation values a = 0.333,0.500, and 0.667. For each saturation, the PESin the different m values were compared using one-way ANOVAs, and the F ratios and RCSSs so calculated are reported in the final column of the table. These data indicate that the PES are largely independent of the number of SCPs in the chromatogram over the a range examined, although there is clearly a large statistical variation in the m estimates when a = 0.667 and m = 100. These results suggest that we can establish general criteria for the relative accuracy of component numbers calculated using the procedure proposed here which are largely independent of m (at least over the range 100-300) and which depend primarily on a or, if the SCPs are asymmetric, on a and U / T . Examination of Tables I1 and I11 suggests that the expected PES in m, and mivalues derived from chromatograms

2174

ANALYTICAL

CHEMISTRY,VOL. 57, NO. 12, OCTOBER 1985

IO,

and 0.667 and Which Contain m = 100, 200, and 300 SCPs Distributed Exponentially in Ampiitudea PE

a

0.333

0.500

0.667

F and RCSS (x2)

m = 100

m = 200

m = 300

m,:

1.2 f 7.2

-0.6 f 11.7

-5.8 f 7.9

mi:

11.8 f 8.3

13.1 f 23.0

9.2 f 6.4

m,:

-20.2 f 10.4

-10.6 f 11.3

-12.7 f 4.3

mi:

-2.3 f 15.3

6.6 f 15.6

3.9 f 8.8

m,:

-30.0 f 20.9

-24.2 f 3.5

mi:

-5.5 f 27.8

-4.5 f 4.7

(RE*= 0.5)

F*z,,.s

F*2,,,4

P z ,=~4.46

F*2,9

4.26

F2,15

F2, = 0.22 x2i* = 4.28 F2,15 = 1.81 x*2 = 2.00 F2,15 = 0.68 x22 = 0.70 F2,8,4 = 0.82 xZ2* = 7.11 F275 = 0.11 x2> = 5.56

-21.3 f 6.4 -1.8 f. 13.4

F*2,15 =

3.68

x22*

= 1.02

xZ2= 0.83

= 3.00

“The final column reports the F ratios calculated by ANOVA and the RCSSs calculated by Bartlett’s homogeneity test; critical values of

F and RCSS for the 95% confidence level are reported at the bottom of the table. Significant tests are indicated by asterisks, containing Gaussian SCPs will be generally