Bayesian statistical methods for use in mass spectral assignment

Chemometrics. Michael F. Delaney. Analytical Chemistry 1984 56 (5), 261-277 ... D.B. Hibbert , N. Armstrong. Chemometrics and Intelligent Laboratory S...
0 downloads 0 Views 730KB Size
Anal. Chem. 1983, 55,1723-1728 (11) Meites, L. “Polarographlc Techniques”, 2nd ed.; Interscience: New York, 1965. (12) Elliot, M.; Murray, R. W. Anal. Chem. 1978, 48, 259-267. (13) Brooks, M. Y.; Rusling, J. F., Universlty off Connecticut, unpublished results, 1983. (14) Jordan, J.; Bednarskl, T. M. J . Am. Chem. SOC. 1084, 86, 5690-5691. .... ... (15) Bednarskl, T. M.; Jordan, J. J . Am. Chem. SOC. 1967, 89, 1552-1558. (16) Meites, L.; Lampugneni, L. Anal. Chem. 11373, 45, 1317-1323. (17) Heyrovsky, J.: Kuta, ,I. “Princigies of Polaroaraghy”; Academic Press: New York, 1966. (18) Barker, G. C.; Faircioth, R. L. I n “Advances in Polarography”; Longmuir, I., Ed.; Pergammon Press: New York, 1960; Vol. I,pp 313-329. . . .-. (19) Gelb, R. I.J . Nectroanal. Chem. 1988, f9, 215-218. (20) White, W, I. I n “The1 Porphyrlns”; Dolphln, D., Ed., Academic Press: New York, 1979; Vol. V, pp 303-339. (21) Davis, D. G. I n “The Porphyrlns”; Dolphln, D., Ed.; Academic Press:

1723

New York, 1979; Vol. V, pp 127-152. (22) Kuwana, T.; Bublitz, D. E.; Hoh, G. J . Am. Chem. SOC. 1980, 82, 581 1-5817. (23) Jordan, J. Anal. Chem. 1855, 27, 1708-1711. (24) Engstrom, R. C. Anal. Chem. 1982, 54, 2310-2314. (25) Nadjo, L.; Saveant, J. M. J. Necfroanal. Chem. 1973, 44, 327-366. (26) Oldham, K. B.; Parry, E. P. Anal. Chem. 1988, 40, 85-69. (27) Bond, A. M. “Modern Polarographic Methods in Analytical Chemistry”; Marcel Dekker: New York, 1980; pp 236-287. (28) Lingane, J. J. “Electroanalytical Chemistry”, 2nd. ed.; Interscience: New York, 1958; p 644.

RECEIVED for review March 29,1983. Accepted May 26,1983. This work was supported by the University of Connecticut Research Foundation and partially by Public Health Service Grant No. 1-RO1 CA-33195-01.

Bayesian Sltatistical Methods for Use in Mass Spectral Assignrnent Lothar M. Karrer, Heather L. Gordon, Stuart M. Rothstein,* Jack M. Miller, and Timothy R. B. Jones Department of Chemistry, Brock University, St. Catharines, Ontario, Canada L2S 3A1

We apply Bayeslan statistlcal methods to estimate the mole fractlons of species present In the linear model which describes the observed mass spectral data. We contrast our approach to the usual1 practlce In mass spectrometry whlch employs least squares, whlch we argue Is not statlstlcally sound. We describe our computer program, with emphasls on explotlng subroutiries from commonly avallable program ilbrarles. Appiicatlonrr involve the deconvolutlon of overiapping spectra resultlng from both successlve loss of hydrogen and gain of hydrogen via ion/molecuk reactlons.

expectation values imposes an additional condition on the coefficients in the model, eq 1 P

cis,= 100

p=l

S

W P=) s=l cxsisp

(1)

where ,i is the peak intensity due to substance s, xs is the unknown mole fraction of substance s, and there are S species present. It has been suggeeited that least-squares procedures be employed for a quantitative analysis of mass spectral data (1-4). However, because the sum of the P observations is normalized to, say, 100% P

El, p=l

= 100

(2)

the random errors tp =

I p - E(I,)

(3)

are not statistically independent and probably do not have equal variances and covariances. To ignore these considerations w ill lead to invalid estimates of the mole fractions (5-7). Futhermore, the requirement that the random errors have zero

(4)

which may also be overlooked. For the purposes of discriminating among various plausible models, where some mole fractions are hypothesized to be zero, usually mass spectroscopists (e.g., ref 8) have confined themselves to the model which gives the lowest R factor, where

R The objective of this paper is to report appropriate statistical methodology for estimating the mole fractions which appear in the linear model which is used to describe mass spectral data. This model represents the expected value of the p t h peak intensity as

s = 1, ..., s

-

P p=l

IIpCalCd - IPI

(5)

where IpcalCd are predicted intensities, using least-squares estimates of the mole fractions. A sum of relative errors or of absolute errors may be used. This approach is undesirable as one can always lower the R factor by introducing additional variable parameters in the model. Furthermore, the statistical method to determine whether or not an additional parameter gives an R factor which differs significantly from that of a model with fewer parameters, Hamilton’s test (9, IO), is invalid because of the lack of independence in the random errors. It was pointed out to us by the referees of an earlier draft of this manuscript that mass spectral data consist of several responses (peak intensities), measured in each run of the spectrum, and normally several runs are taken. This suggests the approach of Box and Draper (7) to estimate the mole fractions, taking care to account for the various linear dependencies in the data (e.g., eq 2). Central to this approach is Bayes’ theorem, where under the assumptions detailed in the Theory section below, the joint (posterior) density function of the mole fractions is obtained. The mole fractions which maximize the density, are estimates of the mole fractions of species present in the spectra. Furthermore, a joint confidence region for the parameters in the model is obtained. If this region includes areas where one or more x are zero, then alternative models which ignore those species are also consistent with the data.

0003-5!700/83/0355-1723$0’1.50/0 0 1983 American Chemical Soclety

1724

ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER 1983

We apply this statistical procedure to three examples which are typical of a problem encountered routinely in mass spectrometry, namely, the deconvolution of overlapping spectra resulting from successive loss and/or gain of hydrogens in the spectra of polyisotopic molecules.

THEORY Several mass spectral traces or data traces are made for the sample under study. For each of R runs we compute the relative intensity for each of the P peaks in the spectrum and normalize the data for each run P

r = 1, ..., R

C Ipr = 100

p=l

S

s=l

BY,,) = c y’(r)

(15)

=

(Yli-9 Y2r, ***, Y p r )

(16)

Also suppose that the same linear relationships hold for the calculated values (in obvious notation)

Bf+) = c

all r

(17)

Then

B(Y,, - f,,)) = 0

all r

(18)

and v will have M zero eigenvalues, as will the matrix DD’, where

Dpr = Ypr- Yp

(7)

where xBis the unknown mole fraction of species s and is, is the known contribution species s makes to the intensity of peak p (subject to eq 4), and

all r

where B is ( M X P),c is a column vector of M constants, and

(6)

We require that R exceed P by a reasonable margin. We assume a model for the expected value of the intensities which is linear in the mole fractions of each of the S species present

E U p r ) = Cxsisp

Box et al. (7) and, more recently, McLean et al. (12) have treated the problem of singularities in multiresponse modeling in some detail. Suppose for all runs there are M exact linear relationships in the data

(19)

and R

YpE C Ypr/R r=l

S

Ex, = 1

s=1

In order to satisfy condition (8), it is convenient to recast our model as follows: let species 1 be a species which is known to be present, e.g., the molecular ion. Substituting 1 - x2 x3 - ...for x1 in eq 7 gives for all r

We assume that eq 15 and 17 hold in our case, and we will apply the following procedure of Box et al. (7) to remove the singularities. (McLean et al. (12) extended this procedure to other cases of singularities in multiresponse data.) Obtain the eigenvalues Xk and P-dimensional eigenvectors z k by solving

Zk’DD’ =

(21)

&Zk’

and normalize the eigenvectors so that where f p models the expected value of Yprand

Ypr

Ipr - ilp

Zk’Zk

(10)

Thus, our mathematical model for peak p and run r is Ypr

= f p ( x ) + cpr

E(tprtqr?= 0

all r , r’; p # q

E ( t p r t q r ) = upq

(12)

all r

This model allows for correlations among random errors within a given run but assumes no correlations among errors from different runs. Box and Draper (11)showed that given the data, eq 10, the model, eq 12, and a “noninformative” prior distribution of x and U , the posterior distribution of x is proportional to det (v)-~/~, where v is (P X P ) R

uij(X)

=

C [Yw - f c ( ~ ) I [ Y j -r fj(x)I

r=l

(13)

Estimates of x yielding the maximum posterior density are obtained by minimizing det (v). However, the observations Y,eq 10, are not linearly independent, e.g. P

C Ypr= 0

p=l

all r

(14)

and this makes v singular. Accordingly, we must modify our approach to remove this singularity, and, indeed, any others which might exist.

.#a,

P

(22)

There should be M eigenvalues equal to zero, and the data are transformed to give P - M linearly independent values for each run r

(11)

where epr is the random error in the observation. We assume that for each run tpr is a multivariate normal random error such that E(tpr)= 0 all r, p

k = 1,

=1

P

Ymr= C z m p Y p rm = 1, -.-,P - M

(23)

p=l P

3 m ( x ) = Czmpfp(X)

m = 1, ..., P - M

p=l

(24)

where 3, models the expected value of the mth linearly independent observation for any run. The values of zk used in eq 23 and 24 are those with nonzero eigenvalues X k . Similarly, the analogue of v, eq 13, is a (P - M ) X ( P - M ) matrix R

Ymn(X)

=

C [ Y m r - ym(x)] [ Y n r - S n ( x ) l

r=l

(25)

We obtain point estimates of x, denoted jjmde, corresponding to the maximum posterior density of the parameters given the observations, p(xIY), by minimizing det (V), where p(xlY)

a

(26)

det[Y(~)l-~/~

For the purposes of establishing joint and marginal confidence regions for x,we follow the approach of Stewart and S~rensen(13). Approximate highest posterior density (HPD) contours, which define a region containing lOO(1- a)%of the posterior probability density, are x values which obey the following equation: (x - jjmode)’ A (x - jjmode) = F(a,S - 1, R - S 1) (27)

( S - l)s2

+

ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER 1983

where s2 3 det

[Y(2m0de)]/(R -S

+ 1)

(28)

and F(a,v,, vd) is the upper lOOcu% point of an F distribution with v,, vd degrees of freedom in the numerator and denominator, respectively. The matrix A is a (25 - 1) X (S - 1)matrix of computed constants in the quadratic approximation to det [Y(x)] (Stewart and Sarensert (ref 14, eq 9))

The marginal distribution of a single mole fraction, say, xi, has a t distribution about gimode with R - S 1 degrees of freedom

+

where ci,is the ith diagonal element of A-l. We use eq 27 to determine if the joint confidence region for the mole fractions include areas where one or more x are zero, and we use eq 30 to determine an approximate confidence interval for the ith mole fraction. These equations are like those for a linear, single-response model (Box and Tiao (15)),and the degrees of freedom reflect that there are S - 1linearly independent parameters in the model, Le., eq 8.

BAYESIAN ALGORITHM It is apparent that a statistically valid analysis of mass spectral data requires highly nonstandard statistical methods. Programming effort ie greatly reduced, however, by using subroutines from the M S L (16) package. In this section we will describe our computer program and specify how we used IMSL and other routiines. We input the (R X P) raw intensity measurements, I, and the P peak intensities due to each of S species, i, and normalize each according to eq 2 and 4. The mole fraction of the molecular ion is eliminated from the analysis by using Y as data (eq 10) and assuming the S - 1 parameter model given by eq 11. We compute the elements of D, eq 19 and 20, and find the eigenvalues X and eigenvectors z, eq 21, by using IMSL subroutines VTPROF and EIGRS. There are M zero eigenvalues (say, or smaller in magnitude), and thie P - M eigenvectors which have nonzero eigenvalues are used to calculate linearly independent observations, eq 23. We occasionally encountered numerical problems when searching for the mode of the posterior probability density, due to the small values of some mole fractions near the mode. We overcame these difficulties by first doing a least-squares fit of the model given by eq 11using the original observations, Y (eq 10). These estimates were obtained by using IMSL subroutine ZSCNT, constraining x to have positive values, bounded by unity, by using a sin2transflormation (Box (17)). (This routine also returns a parameter denoted FNORM, which is small if an adequate numerical solution has been obtained.) Least-squares values of x which were small (less than 5 X we took a@ zero. In such cases the design matrix for the model was altered to ignore these species, and the least squares was repeated. We used the new least-squares estimates as the initial set of parameters for the function minimization subroutine. For the computation of gmde,the value of the mole fractions at the mode of the posterior probability density, eq 26, we used function minimization subroutine V&,A, Algorithm 60 from Quantum Chemistry Program Exchange. This uses Powell’s method (18) to compute the optimum set of x’s, that is, values eq 25. The determinant is computed which minimize det (V),

1725

by using IMSL routine LINV3F, and values of 3 required for the computation of Y are computed from eq 24 for any x. The program computes the residual mean square, eq 28, using gmdeand IMSL routine LINV3F. Each element Y,(X!, eq 25, is expanded to second order in x,and then det (Y) IS evaluated, to second order in x,using Gauss’ method, modified for polynomial arrays (Stewart and Sarensen (14)). The coefficients multiplying the second-order terms are A, eq 29. The c,, values, eq 30, are computed from A by using IMSL routine VMULFT, and this routine is also used to compute the numerator in eq 27. Joint and marginal confidence regions for x are then computed from eq 27 and 30, respectively; critical values for the F and t statistics are computed by using IMSL subroutines MDFI and MDSTI.

APPLICATIONS Our first application involves the deconvolution of overlapping spectra resulting from both successive loss of hydrogen and possible gain of one hydrogen via ion/molecule reactions for H3B3N3C13. For this case an appropriate model could include contributions from M, M + H, M - H, M - 2H, and M - 3H. Mass spectral data were collected on a Kratos/AEC MS30 mass spectrometry using a Kratos DS55 data acquisition system. The instrument was operating in electron impact mode (70 eV), with a 4-kV accelerating potential using a scan speed of 30 s/decade. Data collection is initiated immediately after sample introduction, allowing a settling/equilibration time of five scans. It was found that variations in the number of peaks detected could be as high as 20% without adverse effects on the quality of the data. Data collection was terminated when sample was exhausted. This stage is usually indicated by a significant decrease in the number of peaks detected (>30%) and a corresponding decline in the total ion current (usually >1 in lo2). For H3B3N3C13a total of 18(R) scans of the sample were used lO(P)peak intensities chosen were in the range 180-189 m / e . This meets the requirement of a valid data analysis: R > P. The raw data appear in Table I. The Bayesian algorithm described above ran smoothly until the constrained least-squares fit of Y, eq 10, to the S - 1 parameter model, eq 11. The values of x which we obtained included a very small values for j&y!&, gh!!&, and gy$H and a large value for “FNORM” (lo5)was returned from IMSL routine ZSCNT. This indicated that these estimates for x were not satisfactory; probably this was due to numerical problems associated with the estimated mole fractions for M + H, M - 2H, and M - 3H, which rapidly approached zero under the sin2transformation. Accordingly, we dropped these species from the model and repeated the least squares. Apparently a good least-squares estimate was obtained for the remaining mole fraction as “ F N O R M was quite small (lo+). The eigenvalue-eigenvector analysis section of the algorithm identified one zero eigenvalue. Thus there were only nine linearly independent observations per run; matrix Y (eq 25) is (9 X 9). An exact linear dependency arises from the normalization imposed on the intensities, eq 6. We employed the constrained least-squares estimates as initial estimates in parameter space search for the values which maximize the posterior probability density of x. (Values of x were again constrained to be positive and bounded by unity.) Convergence to the mode was obtained in four iterations (22 evaluations of det (Y));the value of ggg is the point estimate of XM-H. The values of &s$(%) and %E$(%) together with the 90% confidence intervals for them were 1.9 f 1.0 and 1.4 & 0.2, respectively. It is clear that the Bayesian estimate is more precise than the least-squares estimate: this is expected because the Bayesian analysis accounts for the statistical de-

1726

ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER 1983

Table I. Observed Intensities for (H3B3N,C1,)+ m/e

run

189

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

18542 16635 15830 15885 17424 21573 21810 23634 22718 115404 75260 59697 61494 58921 53345 48593 39645 30824

188 187 15750 176723 13870 157952 13386 148390 13318 153139 14677 163744 18452 211571 18800 212154 19890 224858 19660 219341 93868 1069376 63585 711024 53792 596320 52533 592832 49707 592112 517552 47134 44389 489856 33806 370480 302512 28433

186 138304 123206 115418 116640 126394 162381 161946 173491 167686 831184 539040 449584 452528 451712 397344 371440 276192 230936

185 581350 520243 483558 500659 538061 681933 677427 729958 709018 3470784 2309888 194E104 1926720 1858048 1663936 1566144 1178944 969280

184 412826 376160 347411 364198 392442 489114 491136 527846 509389 2487296 1638784 1403968 1390272 1339776 1205056 1131904 841488 689840

183 670464 598042 565171 581402 628326 795392 784102 847718 810189 4025984 2652864 2228096 2286720 2173568 1925376 1816960 1359616 1120960

182 434304 383712 360902 377267 400026 507290 510925 544307 518477 2558656 1717376 1435840 1458560 1380096 1246016 1177216 964800 717152

181 103440 94918 89446 93278 98506 123040 124256 133798 128922 620240 409488 351744 358208 339312 309632 285920 222244 179172

180 9338 8540 7626 7923 8642 11201 11290 11886 11252 54825 34654 31454 30973 27490 24650 22573 19248 15468

Table 11. Bayesian Analysis of (H3B,N3CI3)+ Mass Spectral Data Compositions (%) least squaresa least squaresa BayesC

A

A

XM t H

A

;iM

XM -H

XM -2H

98 98b 98.6

10-3

2 2* 1 1.4i 0.2

A

XM - 3 H

10-3

10-3

Eigenvector-Eigenvalue Analysis A:

10-12,2 (-3), 2 (-3),3 (-3),3 (-2), 3 (-2), 1 (-l), 2 (-1),4 (-l), 3

90% Confidence Intervalsd M - H: c = 1.8 (-3)

se = 6.7 (-7)

t,,

= 1.7

Observed and Calculated Spectra (Averaged Intensities)

obsd % calcd %, least squares calcd %, Bayes sum of squares, least squares sum of squares, Bayes

189

188

187

186

185

0.72 0.75 0.75 0.077 0.095

0.62 0.64 0.64

6.98 7.04 7.07

5.34 5.41 5.40

22.56 22.61 22.70

m/e 184

16.24 16.27 16.23

183

182

181

180

26.16 25.98 26.03

16.90 16.73 16.68

4.13 4.19 4.13

0.36 0.39 0.37

Estimated Variance-Covariances, Gf 1

2

4

3

5

6

7

8

9

1 0.43 (-3) 0.32(-3) -0.63 (-3) 0.51(-3) 0.71 (-3) -0.82(-3) -0.27 (-2) -0.41 (-2) -0.36 (-2) 2 0.67 (-3) -0.88(-3) 0.70 (-3) 0.98(-3) -0.11 (-2) -0.38(-2) -0.57 (-2) -0.50(-2) 3 0.21(-2) -0.14(-2) -0.20 (-2) 0.22 (-2) 0.75 (-2) 0.11(-1) 0.99 (-2) 4 0.41(-2) 0.16 (-2) -0.18(-2) -0.60(-2) -0.91 (-2) -0.79 (-2) 5 0.55(-2) -0.25 (-2) -0,85 (-2) -0.13 (-1) -0.11 (-1) 6 0.17(-1) 0.97 (-2) 0.15 (-1) 0.13(-1) 7 0.56 (-1) 0.49(-1) 0.43 (-1) 0.12 0.64(-1) 8 0.31 9 Computed from eq 8. Confidence intervals a Constrained fit of model given by eq 11 and 12 to data given by eq 10. Elements of c are defined after eq 30. e Computed by use of residual mean are 90% highest posterior density values. squares (eq 28)and R - S + 1 = 17 residual degrees of freedom (ref 13). f Computed from eq 31. pendence in the random errors. The results of the Bayesian analysis are summarized in Table 11. Values of the estimated variance-covariance matrix elements, u, are also reported where 8 = Y(ji""de)/(P - M

+ R + 1)

(31)

As our last application, the spectrum of (C6H&GeC12was taken in the same manner as that described for H3B3N3CI3. A total of 17 scans of the sample were used: peak intensities of a fragment ion, the diphenylgermanium cation cluster, was

observed a t m l e 221-230. The raw data appear in Table 111. The appropriate model for this fragment ion includes contributions from M - Clz, (M - Clz) - H,(M - Clz) - 2H, and (M- Clz) - 3H. The constrained least-squares fit to the data was satisfactory (FNORM = low3).The eigenvalue-eigenvector analysis revealed one exact linear dependence; hence Y is a (9 X 9)matrix. Constrained search of parameter space to minimize det (Y) required 10 iterations, 107 determinant evaluations. The 90% HPD region was found to include nonzero values for the mole fractions of all species assumed

ANALYTICAL CHEMISTRY, VOL. 55, NO. 11, SEPTEMBER 1983

1727

Table 111. Observed Intensities for (C,H,),Ge+ m/e

run 1

2

10 11

12 13 14 15 16

17

230

228

227

226

225

224

223

222

221

180 860 184 152 181 980 1855013 19755!2 191 800 199 6213 200 300 198 872 203 5413 210 672 204 400 217 020 222 820 221916 222 980 218 840

297 696 300 368 305 744 317 760 317 232 316 496 329 168 332 576 332 400 337 584 331 584 339 296 369 168 368 576 359 824 372 288 364 400

313 248 312 160 318 864 324 608 342 064 342 784 341 344 348 768 355 344 361 968 363 568 363 328 381 696 386 880 394 816 389 632 382 112

265 264 273 776 265 312 279 904 292 720 287 360 298 784 302 304 299 248 298 192 305 600 293 136 330 880 332 160 334 096 336 912 330 336

1 8 5 276 193 252 191 000 196 472 200 456 203 516 201 896 212 348 213 740 210 672 211 792 211 056 232 788 233 660 237 200 238 056 226 508

179 888 185 020 181 212 186 268 188 472 192 100 192 752 198 900 202 240 198 496 196 476 199 956 216 404 214 952 226 508 214 508 216 416

106 280 104 996 103 992 107 376 112 192 110 260 114 076 115 556 117 192 115 224 113 600 119 896 123 784 122 028 129 396 125 340 120 732

43 144 40 487 36 165 40 042 39 734 40 100 39 008 36 869 36 899 36 433 37 224 37 914 41 689 39 888 40 288 40 857 44 822

229

20 471 1 8 616 1 2 416 21040 22049 1 8 783 20 184 21 276 1 9 598 20 925 19 018 19 094 21 813 21 560 23035 23 770 21 856

E l l 516

85 588 84 324 05088 89496 95 448 93 036 93 244 93 032 95 004 93 156 92 684 101 536 102 332 99852 101 212 100 692

Table IV. Bayesian Analysis of GeC,,H,,+ Fragment Ion Mass Spectral Data Composition (%) h

X(MC1,)

least squaresa BayesC

X(M-'21,)-H

19b

h

A

X(M -C1,)-2H

X(M-CI,)-BH

43+ 3 45.1 ~t 0.2

17

29 ~t3 30.7 i: 0.2

9 r 3 6.9 ~t0.4

Eigenvalue-Eigenvector Analysis h:

lo-", 3 (-2), 1 (--I.), 1 (-l), 2 (-l), 6 (-l),9 (-l),1,2

X(M-C4)-3H = 0 , X(M-C4)-H =

(M-Cl,)-H: (-2)

0.451, X(M-Cl,)-BH

HPD Analysis 0.307; F = lo3> F o . i 9 3 , i 4 = 2.5 90% Confidence Intervalsd

=

c=6.8(-3);(M-C12:I- 2H: c = 5 . 1 ( - 3 ) ; ( M - C l 2 ) - 3 H : ~ = 1 . 3 ( - 2 ) t , , = 1.8 Observed and Calculated Spectra (Averaged Jntensities)

s e = 1.7

m/e

obsd (%) calcd (%), least squares calcd (%), Bayes sum of squares, least squares sum of squares, Bayes

230

229

228

227

226

225

224

223

222

221

1.08 1.81 1.67 3.74 4.48

4.99 4.18 4.24

10.81 10.52 9.99

17.88 18.11 18.51

18.91 18.94 19.20

16.09 16.64 16.66

11.30 12.37 12.33

10.65 10.53 10.50

6.17 5.35 5.67

2.11 1.56 1.23

Estimated Variance-Covariances, o^ 1

2

3

4

5

6

0.18 (-1)

0.11 (-1) 0.19 (-1)

-0.98 (-1) -0.75 (-1) 0.67

-0.56 (-1) -0.43 (-1) 0.37 0.23

-0.15 -0.12 1.03 0.59 1.66

0.19 0.14 -1.25

7

8

9

0.19 (-1) 0.15 (-1) -0.13 -0.71 -0.73 (-1) -1.97 -0.20 6 2.43 0.24 7 -0.20 8 1.17 -0.16 9 0.18 Constrained fit of model given by eq 11 and 1 2 to data given by eq 10. Computed from eq 8. Confidence intervals are 90% highest posterior density values. Elements of c are defined after eq 30. e Computed by using residual mean squares (eq 28) and R - S + 1 = 14 residual degress of freedom (ref 13). f Computed from eq 31. 1

2 3 4 5

in the model. The 90'% confidence intervals on the Bayesian mole fractions (in percent) were all ablout f0.2,while those for the least-squares estimates were about h3. Again the Bayesian analysis has provided a more precise estimate of the mole fractions. The results of the Elayesian analysis are summarized in Table IV. We have found that the Bayesian procedure is quite sensitive to the quality of the observed data for this system. This

-0.15

-0.12 1.03 0.58 1.63 -1.96 1.72

-0.12 -0.96 (-1) 0.83 0.48 1.32 -1.59 1.31

facet has practical application from the viewpoint of data collection parameters such as scan speed, computer averaging rate, and noise threshold, each of which could be optimized by this type of analysis of the data. As an example we found the minimum values of det (Y) which are related to the random errors in the data, to be quite sensitive to scan speed, typically values of lo6, lo2,and lo4 being observed for scan speeds of 3, 10, and 30 s/decade, respectively, other scan

1728

Anal. Chem. 1983, 55, 1728-1731

conditions remaining constant. SUMMARY We call into question the techniques in the literature which employ least-squares procedures to estimate the mole fractions of species believed to be present. The most serious difficulties with least squares is the lack of statistical independence in the random errors, failure of homoscedasticity, and exact linear dependencies in the observed data which give rise to a singular covariance matrix of the error vector. We applied the Bayesian formalisim of Box and Draper (7) to estimate the mole fractions, the approaches of McLean et al. (12)and Box et al. (7) to eliminate the linear dependencies in the observed data, and some approximations given by Stewart and Srarensen (13,14), and we were able to use formulas given by Box and Tiao (15)for a linear, single response model, to find approximate HPD contours and confidence intervals for the mole fractions. The Bayesian confidence intervals are narrower than those obtained from the leastsquares analysis. All of our applications involved the deconvolution of overlapping spectra resulting from successive loss of hydrogen and, in some cases, gain of hydrogen via ion/molecule reactions. In these examples we emphasized how to interpret the information obtained from the statistical analysis, and we pointed out numerical problems which may force reduction of the assumed model to one involving fewer species. It must be stressed that the statistical method presented may well reject species that an experienced mass spectroscopist would think are present. However, if a species is rejected statistically, one may still wish to claim the existence of it,

perhaps verified by exact mass measurement, but one certainly cannot quantify such a low abundance species based on simple low resolution data. We are indebted to the referees for alerting us to the Bayesian approach for the analysis of multiresponse data and for their valuable comments which facilitated the writing of this paper. LITERATURE C I T E D Clark, Hayden A.; Jurs, Peter C. Anal. Chlm. Acta 1981, 132, 75-88. Benz, Wolfgang Anal. Chem. 1980, 5 2 , 248-252. Blackburn, James A. Anal. Chem. 1985, 3 7 , 1000-1003. Brauman, John I.Anal. Chem. 1968, 38, 607-610. Eakman, J. M. Ind. Eng. Chem. Fundam. 1969, 8 , 53-58. (6) Erjavec, J. Ind. Eng. Chem. Fundam. 1970, 9 , 187. (7) Box, G. E. P.; Hunter, W. G.; MacGregor, J. F.; Erjavec, J. Technome trics 1973, 15, 33-51. (8) Andrews, Mark A.; Klrtley, Stephen W.; Kaesz, Herbert D. Adv. Chem. Ser. 1988, No. 167, 215-231. (9) Hamilton, Waiter Clark "Statistics in Physical Science"; Roland Press: New York, 1964; Chapter 4. (IO) Hamilton, Walter Clark Acta Crystallogr. 1985, 18, 502-510. (1 1) Box, George E. P.; Draper, Norman R. Biometrika 1965, 52, 355-365. (12) McLean, D. D.; Pritchard, D. J.; Bacon, D. W.; Bownle, J. Technometrics 1979, 2 1 , 291-298. (13) Stewart, Warren E.; Serensen, Jan P. Technometrics 1981, 2 3 , 131-141. (14) Stewart, Warren E.; Serensen, Jan P. "Sensltivity and Regression of Multicomponent Reactor Models", I n Fourth International Symposlum on Chemical Reactlon Englneering, Frankfurt; DECHEMA, 1-12-1-20. (15) Box, George, E. P.; Tlao, George C. "Bayeslan Inference In Statlstlcal Analysis"; Addison-Wesley: Reading, MA, 1973; Chapter 2. (16) International Mathematical and Statistical Librarles, Inc. IMSL, Houston, TX, 1980, Version 8. (17) Box, M. J. Compur. J . 1986, 9 , 67-77. (18) Powell, M. J. D. Compur. J . 1984, 7 , 155-162. (1) (2) (3) (4) (5)

RECEIVED for review August 10, 1982. Resubmitted and accepted May 2, 1983.

Oscillometric Flow Cell for Measurement of Conductivity and Permittivity Ern0 Pungor,* Ferenc PB1, and Klhra T d t h Institute for General and Analytical Chemistry, Technical University, Budapest, Hungary

Thls paper reports on a new flow-through oscillornetrlc hlghfrequency conductance mlcrocell which Is sultable for the measurement of conductlvlty or permittivity of streamlng solutions or solvents. The electrodes of the cell are not contacting galvanlcally the measuring solutlon; thus a good stabllity of the electrode surface layer and consequently good reproduclblllty of the measurements can be ensured. The volume of the concentrlc flow cell can vary between 10 and 50 pL.

The increasing demand for monitoring industrial processes and for handling a great number of samples of similar composition in different fields of chemical analysis promoted the development of flow-through measuring techniques. Consequently, a series of continuous flow methods, like flow-injection and flow-titration techniques, were developed. Furthermore, the extended use of different chromatographic methods was significant due to the development of high-pressure liquid chromatography and ion chromatography.

One of the important problems in flow-through measuring techniques is the selection of the appropriate detector. The universal detectors like conductometric detectors can be used primarily in measuring techniques which are combined with either selective chemical treatment as, e.g., in flow titrations, or separations, like various liquid chromatographic methods. Conductometric and permittivity detectors have already been applied to liquid chromatography. With these cells small cell volume ( 1 4 , high sensitivity (6, 7), and wide range of linearity are requirements. In respect of the development of conductivity detectors for ion chromatography the papers presented by Evans et al. (8)and Jupille et al. (9) at the 1982 Pittsburgh Conference and the work of Keller (10) are especially noteworthy. Flow-through permittivity detectors for chromatographic purposes have been designed and studied by several authors (11-18) but more recently great progress has made by Alder et al. (19-21). It appears that in all conductometric and permittivity detectors developed so far there is a direct galvanic contact between the electrodes and the solution tested.

0003-2700/83/0355-1728$01.50/00 1983 Amerlcan Chemical Soclety