Numerical analysis of multicomponent spectra - Analytical Chemistry

Romà Tauler , Enric Casassas , Anna Izquierdo-Ridorsa. Analytica Chimica Acta 1991 ... G. Sala , S. Maspotch , H. Iturriaga , M. Blanco , V. Cerda. J...
2 downloads 0 Views 739KB Size
Numerical Analysis of Multicomponent Spectra D. J. Leggett’ University of Waterloo, Waterloo, Ontarlo, Canada

Non-negative linear least squares and simplex optlmlzatlon algorithms have been applied to the analysis of multlcomponent spectrophotometrlc data. Both types of constrained numerical analysis overcome the problem of negative molar absorptlvltles or concentratlons that may arise when multiple regression analysis, or a similar numerical method, Is employed to solve the simultaneous equations arlslng from Beer’s law. The incorporationof the constrained least squares algorithm in the computer program SQUAD to compute stablllty constants from spectrophotometrlc data is discussed. The appllcabllity of both methods is Illustrated by the analysis of the spectral data derlved from acld-base Indicators In order to determine the composition of mixtures. Ionization constants for bromocresol green, phenol red, and methyl orange, In 0.100 M KCI at 25.0 “C, are determined as 4.59,7.69, and 3.41, respectively.

The analysis of spectrophotometric data to provide molar absorptivities, or concentrations, of the individual absorbing species has been the basis of colorimetric analysis for many decades and, more recently, applied to the evaluation of stability constants. In both instances the majority of studies have been restricted to absorbance values obtained at a limited number of wavelengths, typically peak maxima or isosbestic points. However, since the Beer-Lambert law is not formulated on such limitations, it is theoretically possible to utilize the entire spectral region within which the particular species absorb radiation. Thus, for a matrix of absorbance values obtained at N W wavelengths from N S solutions we have NC

Aik = C cijejk

(1)

j= 1

(assuming 1.0-cm pathlength) where NC is the number of absorbing components, ci; is the concentration of the j t h component in the ith solution and ejk is the molar absorptivity of the j t h species at the k t h wavelength. Sternberg et al. illustrated the use of Equation 1 to determine the composition of steroid mixtures arising from the isomerization of ergosterol to calciferol (1). Several workers (2-7) have employed Equation 1to compute stability constank from spectral data. This equation has also been used to determine, by rank analysis (4,8-14) and factor analysis (15), the number of absorbing species. The general topic of curve resolution has been recently discussed by Shoenfeld and DeVoe (16). Various nonlinear least squares algorithms have been developed, or modified, to evaluate the minimum value of U , N S NW

NC

u = ix= l kc= l (Aik - j = 1 cijejk)

2

Since cij is a function of the stability constants, minimization of Equation 2 leads to the “best” set of constants that describe the available data. It has been noted by several authors (2-4, 6, 7) that during the minimization certain elements of the elk matrix may become negative. Metzler ( 2 )has suggested that in this situation a new function, U,, is defined 1 Present address, University of Toronto, Erindale College, Mississauga, Ontario, Canada L5L 1C6.

276

ANALYTICAL CHEMISTRY, VOL. 49, NO. 2, FEBRUARY 1977

where e;,+are the negative elements of elk. This technique attempts to redirect the search for a minimum value for U whenever a negative elk is encountered. Kankare ( 4 ) has adopted the same stratagem. S i l l h (31, on the other hand, permits the location of an absolute minimum value for U , which may involve negative elements of the elk matrix. Once the “best” set of stability constants has been located, a subroutine is invoked that recalculates the elk matrix, based on the current set of constants but does not permit a negative value for any element. Kaden’s program SPANA (5) automatically sets any negative element to zero. Leggett has recently published a computer program SQUAD ( 7 ) and in the accompanying manual several methods are suggested to overcome the calculation of negative molar absorptivities. They involve data manipulation, rather than a mathematically rigorous approach, to overcome this problem. Lawton and Sylvestre overcome the appearance of negative elements in the solution matrix by their technique of selfmodeling curve resolution (17,18) and expand this to handle more than two overlapping curves (19). They employ a combination of nonlinear regression and principal component analysis. Macnaughton, Rogers, and Wernimont (20) have applied Lawton’s self-modeling method (17 ) to chromatographic data. A step-by-step outline is provided enabling programs to be written to perform the necessary deconvolution of the data. This publication presents two methods, constrained linear least squares or simplex optimization to prevent the calculation of negative elements in the solution vector. The techniques described here are not intended as a replacement for other methods previously mentioned but rather as an alternative tool. Non-Negative Linear Least Squares. The appearance of negative values in the solution matrix elk (if Alk and cLlare known) is a natural occurrence of dealing with data subject to experimental error. This is particularly prevalent when the A,k values arise from a minimum or extremum in the overall spectrum or from any component species spectrum. A similar situation may arise if cLIis sought (assuming ALkand elk are known). Consider an unconstrained linear least squares solution of an over-determined system as represented by Equation 1. Such a system of equations may be solved by multiple regression (17,21). The solution to Equation 1is given by

E = (CtC)-lCtA (4) where E, C , and A are matrix notations for elk, clJ, and Alk. This equation is simple to use for all wavelengths since ( C t C ) - W need only be computed once and subsequently employed for all NW wavelengths. Equation 4 was used in SQUAD to compute E. Multiple regression has also been used by Sternberg ( 1 )to compute C by the related equation C = AEt(EEt)-l

(5)

In both studies the appearance of negative elements in the solution matrix was noted. Non-negative linear least squares will compute a solution matrix for Equation 1 with the condition that E 1 0. This

technique does not arbitrarily set any negative elements to zero but will adjust all non-positive elements to either new positive or zero values. Lawson and Hanson (22)have developed an algorithm that will satisfy the constraint E 2 0. A brief description of the echnique follows. Two index sets are constructed; variables indexed in set Z are held a t zero while those indexed in set P may have values other than zero. If any variable in P takes a nonpositive value, then the algorithm will either move the variable to a new positive value or set that variable to zero and move its index from set P to set Z. There are two loops in the algorithm. The first step in the outer loop selects a coefficient not presently in P (index t E 2 ) that will be positive if introduced into the temporary solution vector z. The equations are then solved (see Appendix) within the inner loop. Returning to the outer loop, if all other components of x, indexed in set P, remain positive, then z becomes part of the solution vector x. Control returns to the beginning of the outer loop, set P is augmented and set Z diminished by the transfer of index t . Should some coefficient indexed in set P become zero or negative in the vector z during the solution of the equations, the algorithm remains in the inner loop. The particular coefficient x is replaced by x a(z - x), 0 < z 5 1, where a is chosen to be as large as possible without the new x becoming negative. This process is repeated until all components of P are positive. It should be noted that the same constraint may be imposed on the solution of an overdetermined set of linear equations by linear programming. Pratt ( 2 3 ) and Herskovits (24) applied this method to process DNA spectra. However, it has been the author’s experience that linear programming is difficult to implement in a general manner and, by comparison with Lawson’s algorithm, rather slow to converge. Simplex Optimization. Deming has recently illustrated the application of simplex optimization to experimental parameters (25-28), and indicated its value to numerical analysis (29). By contrast with most linear and nonlinear numerical analytical procedures, simplex optimization is conceptually simple and the calculations involved are trivial. Since the method has been described in detail elsewhere (29) only an outline will be presented here, applicable to the minimization of the function

Factor C, Figure 1. Possible simplex moves

+

where NC = 2 (for example). For all rational values of c j and e j h , we may plot as a function of c1 and c2 giving an error surface which at some pair of c 1 and c2 will exhibit a minimum value for U . Starting with an initial guess for c1, c2 (vertex A, Figure l), we can evaluate U , i.e., U(A).Now re-evaluate U at two other pairs of c, (vertices B and C) denoting the resulting values of U as U(B) and U(C). Vertices A, B, and C constitute the initial simplex. The sums of squares of residuals U , for each vertex are ranked as highest, next highest, and lowest. For example U(A) > U(B) > U(C).The following rules are applied to create a new simplex. (1) Reject U(A) as having the worst response. (2) Evaluate U(E)-a reflection. (3) If U(E) < U(C), evaluate U(J)-an expansion. (i) If U(J) 5 U(C),the new simplex is BCJ. Go to rule 6. (ii) If U(J) > U(C), the new simplex is BCE. Go to rule 6. (4) If U(B) L U(E) 2 U ( C ) ,the new simplex is BCE. Go to rule 6. (5) If U(E) > U(B).

u

I

1

(i) If U(E) > U(A), evaluate U(G).The new simplex is BCG. Go to rule 6. (ii) If U(A) 2 U(E) 2 U(B), evaluate U(H). The new simplex is BCH. Go to rule 6. (6) If the new vertex (E, J, H, or G) has the highest value of U in the new simplex, then reject the next to worst vertex in this simplex and go to rule 2. If this condition is false, return to rule 1 and repeat the operations. Figure 1 shows the initial simplex and possible points for movement towards a lower value of U. The computer program developed here is based on that by O’Neill (30) from the original algorithm of Nelder and Mead (31). Construction of the original simplex follows the technique suggested by Yarbro and Deming (32). During the progress of the simplex towards the lowest value of U , it is possible that one of the parameters sought may become negative or take on some other undesirable numerical value. To prevent this, a test is made on these parameters and if any lie outside the permitted boundaries, 1050is automatically assigned to U. The attempted move, as dictated by that set of parameters is therefore not accepted and the simplex is forced to move along or away from the imposed boundary. Simplex optimization is in essence opportunist, only current information being used. Consequently it is found that around the minimum the progress of the algorithm tends to slow down. In general, although not always, the simplex converges more slowly than other algorithms. However, the simplicity and lack of assumptions about the function together with the ease by which constraints may be placed on the parameters, make it an attractive alternative to the more widely used algorithms. The applicability of non-negative linear least squares and simplex optimization has been tested by processing spectrophotometric data from mixtures of acid base indicators.

EXPERIMENTAL Aqueous solutions of bromocresol green (BCG),phenol red (PR), methyl orange (MO)and indigo carmine (IC) were prepared and the ionic strength adjusted to 0.100 M with KC1. The spectra of individual indicators and mixtures were measured, as a function of pH, with a dual-beam Cary 14 recording spectrophotometer (purged with dry nitrogen). The baseline was obtained for a pair of matched, quartz, 10.0-mm pathlength cells before any spectra were measured and after sets of three spectra had been obtained. There was no observable evidence of baseline drift. All baseline data so obtained were averaged for each wavelength and the mean value was subtracted from each absorbance at the appropriate wavelength prior to any data processing. Measurements of pH were made with an Orion Research 801 pH-meter previously calibrated as a hydrogen ion concentration probe (33). All spectra were digitized at 10.0-nm intervals and the data processed by an IBM 370/165 computer. The pK,’s and molar absorptivities reported here relate to 25.0 f 0.1 O C .

ANALYTICAL CHEMISTRY, VOL. 49, NO.

2, FEBRUARY 1977

277

Table I. Molar Absorptivities of BCG and PR Computed by Multiple Regression

,2

Flgure 2.

Wavelength, nm

BCG-

H(BCG)

PR-

H(PR)

370.0 380.0 390.0 400.0 410.0 420.0 430.0 440.0 450.0 460.0 470.0 480.0 490.0 500.0 510.0 520.0 530.0 540.0 550.0 560.0 570.0 580.0 590.0 600.0 610.0 620.0 630.0 640.0 650.0 660.0

-3.5863 6' -3.3043 6 -7.5163 5 -4.0963 6 -1.8573 6 -1.0253 6 -4.2833 6 8.4133 5 7.4513 4 -1,5203 6 -1.7533 6 -1.0513 6 -1,4163 6 -1.0493 6 -2.8273 6 -1.2213 6 2.2543 6 2.3983 6 5.2143 6 3.1923 6 1.1803 5 -9.2913 5 -2.2493 6 -1.7463 6 -1.4073 6 -3.9123 5 1.5183 6 7.2723 5 1.3643 6 1.2833 6

-3.5833 6 -3.3023 6 -7.4943 5 -4.0923 6 -1.8503 6 -1.0143 6 -4.2683 6 8.5853 5 9.2403 4 -1.5033 6 -1.7383 6 -1.0393 6 -1.4083 6 -1.0443 6 -2.8273 6 -1.2253 6 2.2463 6 2.3863 6 5.1973 6 3.1723 6 9.2963 4 -9.5903 5 -2.2843 6 -1.7873 6 -1.4523 6 -4.3613 5 1.4793 6 6.9733 5 1.3443 6 1.2723 6

3.6933 6 3.4023 6 7.8153 5 4.2153 6 1.9153 6 1.0603 6 4.4033 6 -8.5843 5 -7.0733 4 1.5683 6 1.8093 6 1.0913 6 1.4713 6 1.0993 6 2.9323 6 1.2913 6 -2.2673 6 -2.4043 6 -5.2783 6 -3.1943 6 -5.0713 4 1.0053 6 2.3523 6 1.8373 6 1.4923 6 4.4793 5 -1.418E 6 -7.1583 5 -1.3803 6 -1.3063 6

3.6953 6 3.4103 6 7.9413 5 4.2313 6 1.9333 6 1.0793 6 4.4233 6 -8.3993 5 -5.5443 4 1.5783 6 1.8143 6 1.091E 6 1.4643 6 1.0863 6 2.9113 6 1.2633 6 -2.3023 6 -2.4473 6 -5.3343 6 -3.2553 6 -9.4903 4 9.8473 5 2.3443 6 1.8343 6 1.491E 6 4.4743 6 -1.5183 6 -7.1583 5 -1.3793 6 -1.3063 6

Molar absorptivities for bromocresol green (25.0 O C ; 0.10 M

KCI). Curve 1, H(BCG); curve 2, BCG-

2

a

WRVELENGTH (NM1

Flgure 3. Molar absorptivities for phenol red (25.0 O C ; 0.10 M KCI). Curve 1, H(PR); curve 2, PR-

RESULTS AND DISCUSSION Evaluation of Molar Absorptivities. Separate sets of pH-dependent data, derived from solutions of the individual indicators BCG and PR, were processed by SQUAD to check the performance of the subroutine SOLVE2 (the constrained linear least squares algorithm). Figures 2 and 3 show the molar absorptivities of BCG and PR. Data derived from mixtures of BCG and PR (18 different pH's and 30 wavelengths) were analyzed by SQUAD using either constrained least squares (SOLVEI) or multiple regression (SOLVE1). These ilidicators were chosen to illustrate the applicability of SOLVE2 for two reasons. First, the spectra of protonated and unprotonated forms of both indicators are well separated (440 nm and 620 nm for BCG; 432 nm and 560 nm for PR) and second, the acid forms have virtually the same A., From a numerical standpoint, analysis of spectra, derived particularly from mixtures of BCG and PR, is frequently encumbered by both the wide separation of component species spectra and the closely overlapping bands of both protonated forms. 278

The notation E 6 implies X lo6.

The molar absorptivities computed by the original version of SQUAD (with SOLVE^), are presented in Table 1. Those obtained when SOLVE2 replaces S O L V E i are in excellent agreement with the values displayed in Figures 2 and 3, obtained from spectra of single indicators. To further emphasize the problems that may be encountered with the multiple regression method, part of the computer output, derived from both algorithms, is displayed in Tab12 11. This comparison shows that the absorbances calculated with multiple regression values of E are in both cases closer to the observed values than when calculated with constrained least squares values. However, the computed molar absorptivities are obviously meaningless when multiple regression is employed to solve the overdetermined linear equations. I t is also significant that the difference between observed and calculated absorbance values, derived from either algorithm, is less than the manufacturer's specifications for precision of the spectrophotometer, but the small difference in computed absorbance values leads to either reasonable or meaningless molar absorptivities. Data shown in Table I11 indicate that there is a decrease in precision with which the pK,'s can be evaluated from indicator mixtures and that the standard deviation of the entire fit (UDATA) is slightly poorer. This decrease in precision is a consequence of not accepting a solution vector for Equation 1 that contains negative elements. The failure of unconstrained least squares was not evident in the original publication of SQUAD (7). In particular, the species spectra of the nickel-ethylene-diamine system (Figure 6, ref. 7) did not exhibit either of the numerical problems mentioned earlier. It is only when attempting to analyze data

ANALYTICAL CHEMISTRY, VOL. 49, NO. 2, FEBRUARY 1977

Table 11. Results of Analyzing Data from Indicator Mixtures by Both Algorithms

2

Computed concentrations of species Solution number

[BCG-1 IPR-I [H(BCG)I [H(PR)I

5(pH 4.399)

10(pH 6.490)

5.2773-6 6.2843-9 9.2733-6 1.4173-5

1.4343-5 7.3503-7 2.0443-7 1.3443-5

Molar absorptivities (at 570 nm) calculated by both algorithms

BCGPRH(BCG) H(PR)

SOLVEl

SOLVE2

1.1803 5 -5.0713 4 9.2963 4 -9.4903 4

2.5423 4 4.6933 4 2.4443 2 3.4703 2

Absorbance values (at 570 nm) for two solutions

Solution 5 Solution 10

Figure 4. Molar absorptivities for methyl orange and indigo carmine

SOLVEl

SOLVEP

Observed

0.1398 0.3991

0.1416 0.4038

0.139 0.401

(25.0 'C; 0.01 M KCI). Curve 1, MO-; curve 2, H(M0); curve 3, IC

Calculation of Concentrations. Data from mixtures of methyl orange (MO) and indigo carmine (IC) were processed by both constrained least squares and simplex optimization to determine the concentrations of each component. The pH range was chosen to bracket the ionization of MO but did not exceed pH 4.5. This ensured that IC existed only in the protonated form. The pK, and molar absorptivities for MO were previously determined using SQUAD (subroutine SOLVES). The pK,, by this method, was found to be 3.412 f 0.002 and QATA = 0.002 absorbance unit on 300 data points. Molar absorptivities for IC were determined from ten solutions to 4.125 X having concentrations in the range 4.613 X M. The data were processed by linear regression analysis assuming some error on both absorbance and concentration values. The molar absorptivities for MO and IC as a function of wavelength are shown in Figure 4. The experimental parameters and results of the numerical analysis are collected in Table IV. The precision and reproducibility of the method is seen to be 1 or 2 parts per thousand. This, and the absence of any negative elements in the solution vector, indicate that in fact a confident estimate of the concentrations of absorbing components could have been obtained from analysis of any single spectrum. The agreement among the twelve pK, values

Table 111. Comparison of Computed pK,'s Derived from Both Algorithms System examined BCG only PR only BCG + PR BCG + PR

Subroutine SOLVEP SOLVE2 SOLVEl SOLVE2

PKZi BCG PR UDATA 4.595 f 0.001 *.. 0.002 ... 7.695 & 0.002 0.003 4.596 f 0.003 7.705 f 0.003 0.003 4.612 f 0.006 7.723 f 0.005 0.007

in which the numerical contribution of one or more components approaches zero that multiple regression analysis tends to fail. For instance the four species present in solution give rise to 90.91% (BCG-), 8.61% (PR-), 0.01% (HBCG), and 1.17% (HPR) contributions to the observed absorbance a t 570 nm and pH 6.49. This situation is often encountered with metal ligand interaction studies performed over a wide p H range.

Table IV. Analysis of Methyl Orange-Indigo Carmine Mixtures [MO-] 1 2 3

4 5 6 7 8 9 10 11

12

3.458 3.718 3.993 4.215 3.161 3.340 3.593 3.671 3.536 3.722 4.020 4.448

1.2553-5 1.5703-5 1.8323-5 1.9823-5 4.4273-6 5.4823-6 7.1453-6 7.5973-8 1.344E-5 1.5813-5 1.8583-5 2.0843-5

1.0123-5 6.9563-6 4.3283-6 2.7853-6 6.8193-6 5.7063-6 4.0763-6 3.6093-6 8.9283-6 6.6273-6 3.7963-6 1.6003-6

+ [HMO]

2.2683-5 2.2663-5 2.2653-5 2.2613-5 1.1253-5 1.1193-5 1.1223-5 1.1213-5 2.2373-5 2.2433-5 2.2383-5 2.2443-5

[IC1

Mean concentration" and standard deviation

pKab

1.7293-5 1.7273-5 1.7273-5 1.7293-5 1.7093-5 1.7133-5 1.7163-5 1.7133-5 8.5543-6 8.6293-6 8.6223-6 8.5693-6

CMO= 2.2653 5 f 2.9E 8 CIC= 1.7283 5 & 1.23 8 CMO= 1.12183 5 & 02.53 8 CIC= 1.7133 5 f 2.93 8 CMO= 2.2413 5 f 3.53 8 CIC= 8.5943 5 f 3.73 8

3.435 3.435 3.437 3.433 3.413 3.428 3.419 3.418 3.429 3.415 3.419 3.417

Known compositions of the measured solutions are: solutions 1-4 and 9-12 are 2.433-5 M, solutions 5-8 are 1.223-5 M in MO; solutions 1-8 are 1.7533-5 M, solutions 9-12 are 8.7653-6 M in IC. Calculated from the concentrations shown in this table for MO . ANALYTICAL CHEMISTRY, VOL. 49, NO. 2, FEBRUARY 1977

279

simplex optimization) is embodied in a large program, such as SQUAD, it is the responsibility of the investigator to ensure that the chosen model describes the observations as well as can be expected. This may be ascertained by inspection of such parameters as standard deviations of computed stability constants, goodness of fit of the computed spectra, etc. It has been clearly demonstrated here that the mere presence of negative molar absorptivities is not always indicative of the wrong model. Nor is it justifiable to set those negative values to zero and continue with the computations. The results obtained here show that it is possible to solve a system of simultaneous equations by either a constrained linear least squares or simplex optimization, subject to the condition that the solution does not contain negative elements. No additional techniques are required, nor is it necessary arbitrarily to divide the data into regions that, for instance, correspond to the nearby exclusive presence of some of the species contributing to the observed phenomenon and thereby to ignore the presence of the minor component(s). All programs alluded to in this publication are available from the author at nominal charge.

-KEY-

0 ECGA PRH (BCGI X H LPRI

+

Figure 5. Comparison of theoretical and calculated (NNLS) concentrations of BCG and PR

and the small difference between the mean value of 3.425 compared to the value determined by SQUADprovide both an internal and external check on the consistency of both methods. A second set of spectra derived from mixtures of BCG and PR were processed to obtain concentrations of all species present. The molar absorptivities were those computed earlier by SOLVEZ. Constrained least squares and simplex algorithms yielded identical numerical values for these concentrations. When multiple regression analysis was performed on the same data set, four computed concentrations were negative, all others being identical. In Figure 5 are presented the results of the constrained least squares analysis of these data. The solid lines indicate the theoretical pH dependency of BCG and PR species based on their pK,’s, determined in this study, and the analytical concentrations of each indicator. The symbols show the computed concentrations of each species derived from constrained least squares analysis. Table V compares this form of analysis with multiple regression analysis for the instances where the latter technique gave negative concentrations. It is of interest to note that with the same sized data array (18 solutions and 30 wavelengths) only four sets (of 18) yielded negative concentrations whereas 60 (of 120) yielded negative molar absorptivities. Although this comparison should not be stressed too heavily since the 540 absorbance values are derived from the different sets of data, it is significant that there are 30 simultaneous equations to be solved, yielding four concentrations (per spectrum), but only 18 equations to give four molar absorptivities (per wavelength). The above discussion, has the implicit assumption that the correct model has been used to set up the particular algorithm. When concentrations are to be computed by these techniques, particularly for quality control, there is frequently no doubt as to the number of species present. However, when constrained least squares analysis (or

APPENDIX

The Solution of Ax P G Subject to kn > 0. Let A be an m X n matrix of rank n and let G be an m-vector satisfying

with w > 0. If it is the least squares solution of Ax = b, then P, > 0 when kn denotes the nth component of P. Proof. Let Q be an m X n orthogonal matrix that zeros the subdiagonal elements in the first n - 1 columns of A, thus

s u( 1 n-1 t vlJm-n+l 1 1

IR Q [ A : b ] - 10 n 1 n-1

I

(2)

-

where R is upper triangular and nonsingular. Since Q is orthogonal the conditions of Equation 1imply that

RTu=O

(3)

sTu+tTv=w>O

(4)

and Since R is nonsingular, Equation 3 implies that u = 0. Therefore Equation 4 reduces to tTv=u>O

(5)

From Equation 2 it follows that the nth component I, of the solution vector k is the least squares solution of the reduced problem tx,

z

v

(6)

Table V. A Comparison of Concentrations Computed by Constrained Least Squares (CLS) and Multiple Regression (MR) Analysis Solution No.

PH

10 11 12

6.490 6.965 7.291 9.317

18

280

[BCG-] x CLS

M MR

[H(BCG)] x M CLS MR

[PR-] x CLS

M MR

[H(PR)] x CLS

M MR

1.430 1.440 1.449 1.461

1.470 1.477 1.481 1.481

3.214 2.207 1.519 0.271

0.757 1.911 3.497 13.13

0.756 1.917 3.492 13.14

12.84 11.53

12.84 11.52 9.883 0.172

ANALYTICAL CHEMISTRY, VOL. 49, NO. 2, FEBRUARY 1977

-0.752 -1.469 -1.665 -1.758

9.878 0.184

Can. J. Chem., 51, 626 (1973). (19) E. A. Sylvestre, W. H. Lawton, and M. S. Maggio, Technometrics, 16,353 (1974). (20) D. Macnaughton, Jr., L. B. Rogers, and G. Wernimont, Anal. Chem., 44, 1421 (1972). (21) 0. L. Davies and P. L. Goldsmith, "Statistical Methods in Research and Produdtbn", 4th ed., Oliver and Boyd, Edinburgh, Scotland, 1972. (22) C. L. Lawson and R. J. Hanson, "Solving Least Squares Problems", Prentice-Hall, inc., Englewood Cliffs, N.J., 1974. (23) A. W. Pratt and J. N. Rushizky, Biochemistry, 3, 1831 (1964).(24) T. T. Herskovits and M. Sorensen, Biochemistry, 7, 2523 (1968). (25) S. L. Morgan and S. N. Deming, Anal. Chem., 46, 1170 (1974). (26) S. N. Deming and P. G. King, Anal. Chem., 46, 1476 (1974). (27) L. R. Parker, Jr., S. L. Morgan, and S. N. Deming, Appl. Spectrosc., 29, 429 (1975). (28) W. K. Dean, K. J. Heald, and S. N. Deming, Science, 189, 805 (1975). (29) S. N. Deming and S. L. Morgan, Anal. Cbem., 45, 278A (1973). (30) R. O'Neill, App. Statist., 20, 338 (1971). (31) J. A. Nelder and R. Mead, Comput. J., 7, 308 (1965). (32) L. A. Yarbroand S. N. Deming, Anal. Chim. Acta, 73, 391 (1974). (33) D. J. Leggett and W. A. E. McBryde, Talanta, 21, 1005 (1974).

Since the pseudoinverse of the column vector t is t T/(tTt), the solution of Equation 6 is given by

kn = tTv/tTt = W/tTt

>0

(7)

LITERATURE CITED ( I ) J. C. Sternberg, H. S. Stillo, and R. H. Schwendeman, Anal. Chem., 32, 84 (2) (3) (4) (5) (6) (7) (6)

(9) 10) 11) 12) 13) 14) 15) 16\

.17i (18)

(1960). K. Nagano and D. E. Metzler, J. Am. Chem. Soc., 89, 2891 (1967). L. G. Sillen and E. Warnqvist, Ark. Kerni, 31, 337 (1968). J. J. Kankare, Anal. Chem., 42, 1322 (1970). T. Kaden and A. Zuberbuhler, Talanta, 18, 61 (1971). S. Feldberg, P. Klotz, and L. Newman, lnorg. Cbem., 11, 2860 (1972). D. J. Leggett and W. A. E. McBryde, Anal. Chem., 47, 1065 (1975). R. M. Wallace, J. Pbys. Chem., 64, 899 (1960). G. Weber, Nature (London), 190, 27 (1961). S. Ainsworth, J. Phys. Chem., 65, 968 (1961). S. Ainsworth, J. Phys. Chem., 67, 1613(1963). D. Katakis, Anal. Chem., 37, 876 (1965). G. Wernimont, Anal. Chem., 39, 554 (1967). L. P. Varga, Anal. Chem., 39, 1101 (1967). J. T. Bulmer and H. F. Shurvell, J. Phys. Cbem., 77, 256 (1973). P S_ . Shoenfeld . . .. and J. R. DeVoe. Anal. Chem.. 48. 403R (1976). W H Lawton and E. A. Sylvestre, Technometrics, 13, 617 (1971) R. L. Reeves, R. S. Kaiser, M. S. Maggio, E.A. Sylvestre, and W. H. Lawton,

RECEIVEDfer review March 29, 1976. Accepted October 27, 1976. This work was supported by grants from the National Research Council of Canada (A-0025 and A-1051).

Calculations of Isotopic Distribution in Molecules Extensively Labeled with Heavy Isotopes H. Yamamoto and James A. McCloskey" Department of Biopharmaceutical Sciences, University of Utah, Salt Lake City, Utah 84 172

Equations are presented which permit derivation of isotopic distribution relationships for molecules that are extensively labeled with heavy isotopes, and permit calculation of the percent enrichment required to achieve any desired level of all-light isotopic species. The relationship between isotopic enrichment, number of atoms, and fraction of unlabeledmolecules is examined.

Increased availability of stable isotopes in a variety of forms ( 1 ) has been accompanied by sharply increased interest in their applications to chemical and biological problems (2,3 ) . In particular, the use of stable isotope labeled internal standards for analysis by gas chromatography-mass spectrometry with the selected ion monitoring technique ( 4 , 5 )has been a major analytical development in the biomedical sciences. These studies have heightened interest in the chemical or biological preparation of extensively labeled substrates, and therefore lead to the desirability of calculating the distribution of isotopic species and of examining the relationships among the major variables. The biological preparation of uniformly and extensively labeled materials is of particular interest because of its ability to produce many labeled compounds for which chemical synthesis would be difficult or expensive. In addition, extensively labeled molecules are necessary to serve as carrier-internal standards for work at high isotopic dilution (e.g., Ref. 5 ) in which the amount of unlabeled species in the standard material must be reduced to an acceptably low level in order to avoid serious background corrections. In the case of either chemical or biological synthesis with heavy isotopes, it may be desirable to calculate the theoretical isotopic distribution in advance of synthesis. These calculations become increasingly difficult as the molecular size and number of labeled atoms increase. In the present report we

present an extended and specialized treatment of the standard probability equation for determining the abundances of isotopic species as a function of mass (6).These modified equations permit derivation of the complete mass-abundance pattern envelope for relatively complex and extensively labeled molecules, and permit calculation of the percent enrichment required to achieve any desired level of unlabeled component (usually the lowest mass species) for a given molecular elemental composition. The equations described in the following section have been written in FORTRAN for a PDP-11/40 computer and occupy 12K words of core. The examples shown were derived with the aid of the computer, using the following isotopic abundances: 1H, 99.985%; ZH, 0.015%; IT!, 98.893%; 13C, 1.107%; 14N, 99.634%; 16N,0.366%; l6O, 99.759% 170,0.037%, ISO, 0.204%; x2S,95.00%; j3S,0.76%; 34S,4.22%. Calculation of Isotopic Distribution as a Function of Mass and Isotopic Abundance. Beynon has described (6) the probability ( P ) that no heavy isotopes will occur in a molecule of composition C,H,N,O, as

where c, h, n, and o represent the percent abundances of heavy isotopes of carbon, hydrogen, nitrogen, and oxygen ( 1 7 0 and ISO),respectively. Expansion of Equation 1to expressions for the first isotopic species (one heavy isotope) and second isotopic species was also presented (6), with numerical values shown in tabular form (7). These probability expressions can be generalized to describe the probability of occurrence of any two-isotope element (e.g., hydrogen):

ANALYTICAL CHEMISTRY, VOL. 49, NO. 2 , FEBRUARY 1977

281