Strategy for resolving rapid scanning wavelength experiments by

Strategy for resolving rapid scanning wavelength experiments by principal component analysis. R. N. Cochran, and F. H. Horne. J. Phys. Chem. , 1980, 8...
1 downloads 0 Views 903KB Size
2561

J. Phys. Chem. 1980, 84,2561-2567

Strategy for Resolving Rapid Scanning Wavelength Experiments by Principal Component Analysis R. N. Cochran' and F. H. Horne" Department of Chemistry,Michigan State University, fast Lansing, Michigan 48824 (Received: June 15, 1979: In Final Form: May 12, 1980)

Principal component analysis is a method for determining the minimum number of absorbers in a rapid scanning wavelength kinetics experiment. A strategy is presented for inferring from a rapid scanning experiment the spectral and kinetic properties of its absorbers, once their number has been determined by principal component analysis. Least-squares fitting criteria are developed for testing whether measured static spectra of known or suspected absorbing species account for absorbers in the experiment. When measured static spectra do not account for the required number of absorbers, the experiment is partitioned into wavelength and time subspaces wherein the number of absorbers is small. The simplified analysis of one-, two-, and three-absorber subspaces is pnesented. In the final step of the strategy, information obtained from fitting measured static spectra and from analyzing subspaces of the experiment is reassembled to estimate for the whole experiment the static spectra and concentration profiles of its absorbers.

Introduction Rapid scanning spectroscopy is a powerful experimental tool for characterizing the progress of reactions that have more than one light-absorbing species. In a rapid scanning wavelength absorbance kinetics experiment, absorbance is measured at p wavelength channels across a spectral region that is rapidly and repeatedly scanned as the reaction progresses. I f the scanning rate is fast compared to the fastest absorbance change in the reaction, the resulting N consecutive spectra may be regarded as instantaneous, and the absorbance-wavelength-time surface may be represented by the ( p X N) matrix A, where A, is the absorbance measured at wavelength channel i at the time of scan j . Since multicomponent reactions in general produce unstable intermediates as well as unanticipated products, the number of absorbers in a rapid scanning experiment is generally unknown before the experiment is analyzed. Moreover, the spectra of these intermediates and products are generally unknown. A previous paper2 dealt with the use of statistically weighted principal component analysis to determine the number of absorbers in a rapid scanning kinetics experiment. The two kinds of principal component analyses, M analysis and S analysis, yield respectively the minimum m of absorbers required to interpret the experiment and the minimum number s of absorbers whose concentrations must have changed during the experiment. In this paper, we develop further tools and propose a strategy for inferring from a rapid scanning experiment the spectral and kinetic properties of its absorbers. We do not in this paper discuss the use of these properties in testing mechanistic hypotheses. The first step in the strategy is the determination of the minimum number of absorbers for the whole experiment, In the second step, the measured static spectrum of each known or suspected reactant, product, and catalyst is tested by a least-squares fitting criterion to determine whether it accounts for one of the absorbers. If the measured static spectra fit and account for the m required absorbers, then we have resolved the experiment. If, however, intermediates and products remain whose static spectra and concentrations are unknown, additional steps are required. We examine subspaces of wavelength and time to find parts of the experiment where the number q of absorbers is small. When q is small, the analysis is 0022-3654/80/2084-256 1$01 .OO/O

considerably simplified, and further information about static spectra and concentrations is obtained for each subspace. The final step is to reassemble the information from the subspaces in order to estimate the static spectra and the concentrations for the whole experiment. As before,2 we assume that A satisfies the model given by eq 1, where q is the number of absorbers, FCT is the

A = FCT + E =

Q ]=I

fjCjT

+

E

(1)

errorless contribution of Beer's law, and t is the matrix of uncorrelated random measurement errors. The static spectrum matrix F is defined by F = (fl, f2,...,fJ, where fj, the static spectrum of absorber j , is a p-component column vector whose ith element is the product of the absorbance-cell path length and the molar absorptivity of absorber j at wavelength channel i. The concentration matrix C is defined by C = (cl, c2,...,cq) where cj, the concentration vector of absorber j , is an N-component column vector whose ith element is the molarity of absorber j during spectrum i. The i, jth element of E is assumed to have expected value zero and variance satisfying the model var(ci,) = uZL,= xiz, (2) From the eigenvectors and eigenvalues of the M analysis, we develop criteria for estimating F and C to within arbitrary multiplicative constants. We find that q2 elements of F and/or C suffice for estimating the ( N p ) q elements of F and C. This is a great reduction in the arbitrariness of these matrices since ( N + p ) q is generally much larger than q2. Moreover, we find a least-squares criterion for deciding, independently of any assumptions about other absorbers or kinetic mechanisms, whether a single suspected absorber's static spectrum or concentration vector fits the experimental data. Each suspected static spectrum or concentration vector that fits provides q of the q2 numbers required to estimate F and C. The problem of estimating F and C takes simpler forms when q is small. We present the simplifications for an experiment or any part of an experiment that contains only one, two, or three absorbers. For experiments containing only two absorbers, we present equations that require no additional information to define the upper and lower bounds of the static spectra and concentration vectors.

+

0 1980 American Chemical Society

2562

The Journal of Physical Chemistry, Vol. 84, No. 20, 1980

These equations are extensions of those presented by Lawton and Sylvestre3for the individual spectra in two absorber equilibrium systems and are similar to those presented by Warner et aL4 for emission and excitation spectra in two-component fluorescence data. The upper and lower bounds define bands of acceptable values of static spectra and concentration vectors. From the eigenvectors and eigenvalues of the S analysis, we develop a least-squares criterion for determining whether a suspected absorber whose static spectrum fits the data has a rate in the reaction that is linearly independent of the rates of the other ( q - 1) absorbers. Finally, these tools are utilized in a seven-step strategy proposed for resolving rapid scanning kinetics experiments by principal component analysis. In the following paper,5 the strategy is applied to a reaction catalyzed by the enzyme horse liver alcohol dehydrogenase. The analysis shows that there are at least seven absorbers, of which four are transient intermediates. M Analysis Estimates of F and C. In M analysis we form the weighted second moment matrix Mw, defined by2

Mw = (l/N)AwAwT Aw = LAT L = a1/2diag(xl-1/2,x2-1/2,,..,xp-1/2)

a1 1. F2 1 . 2 6,

The essential rank m is the minimum number of absorbers required to interpret the experiment and-is determined by finding the lowest value of r for which A!,,, defined by eq 5, fits the experimental matrix A to within its random

A(,)= (L-l@(r)@T(r+)A @(r)

(5)

= (419427*-.,4r)

errors. An identical but more useful equation for A(,,is contain eq 6. It can be shown6 that the r columns of 9(,)

A,,) = L-l@(,)Q(,)qT(,)T-l Q,,) = diag(wl,w2,...,w,)

wj

FCT = L-l@(,)Q(,)q(,,TT-l

(8)

Solved for F, eq 8 gives

where a and b are arbitrary positive constants. The eigenvalue equation to be solved is Mw@ = @A (4) @ = (41,42,...,4J

,...,6,)

profile, and its contributio? to the measurcd ab_so;bansewavelength-time surface. F is defined by F = (fl,f2,,.,,fm), where fjis an estimate of aJ4,with CY] an arbitrary constant. C is defined by C = (61,62,...,6m)Lwhere 6, is an estimate is then an estimate of (l/aJ)cJ.The vector product fJeIT of f,c T, the contribution of the jth absorber to the measured absorbance surface. Whereas F and C are errorless, F and 6 are not, in general, errorless. To see this, define the residpal absorbance matrix foz p absorbers as R,,) =,p! - A(,). The terms FCTand FCTare then related by FCT = FCT + (t - R(,)). In an errorless experimcqt, where t is zero, the term (t - R(,)$ is also zero, and FCT equals FCT. In an experiment with errors, where_cis nonzero, (e - R(m))is in general also nonzero, and FCT does not equal FCT. However, it can be shown6 that (t - R,,)) vanishes in a properly weighted M analysis as either N , the number of consecutive spectra, or p, the number of wavelength channels, becomes very large. Therefore, we take FCTto be an error-containing estimate of FCT. Equations 6 and 7 together give

(3)

T = b1l2 d i a g ( ~ ~ - l / ~ ., .z.~, Z - ~N/ ~ ,' ~ )

A = diag(6&

Cochran and Horne

(6)

= +N6,'iz

= ( $ 1 , # 2 , - - , + r ) = Q(rY1AWT@(r) the first r eigenvectors of Mw', defined by Mw' = (1/ PIAW~AW. The essential rank m is not only the minimum number of absorbers but also the maximum number of absorbers whose spectral and kinetic properties can be determined by analyzing a particular experiment. In some experiments there may be extra absorbers that cannot be distinguished because their concentrations or their static spectra are linearly coupled. Additional experiments, in which conditions are varied, are required to detect and characterize such additional absorbers. We assume, for a particular experiment, that m is the total number of absorbers, Le., that q equals m. The model for M analysis estimates of F and C is eq 7 , *(r)

(7)

where A(,) is given by eq 6, and FCT is an estimate of FCT in eq 1. We use eq 7 to estimate for each absorber the shape of its static spectrum, the shape of its concentration

F = (L-'@(,))u where U is an ( m X m) matrix defined by U = f+,)!P(,)TT-lC(CCT)-l Solved for

(9) (10)

C,eq 8 gives

C = (T-%(,,)V where V is an ( m X m) matrix defined by

v = Q(,)@(,)TL-1F(PFT)-1

(11) (12)

Note that eq 9 aqd-11 assume, Cespectively, the existence of the inverses (CCT)F1and (FFT)-l. These inverses are guaranteed since m is the essential rank of Mw. Resubstitution of eq 9 and 11into eq 8 gives the following condition on U and V:

UVT = Q(,, (13) Equa_tions9-13 considerably reduce the arbitrariness of F and C, F is a rotation of L-l@(,)by the rotation matrix U , and C is a rotation of T-%(,) by the rotation matrix V. If the m2 elements of either U or V are knoyn, the unknown rotation matrix is given by eq 13, and F and C can be computed directly from eq 9, and 1,1,respectively. Thus, the strategy for obtaining F and C is to estimate enough elements of U and V separately so that all of U and V are given by eq 13. To this end, we partition U and V U = ( U ~ , U ~ , - * * , U , ) V = (V~,V~,..*,V,) (14) where each uJ and each v, is an m component column vector. Then eq 9 and 11 become tJ= (L-%(,,)u, j = 1, 2, ..., m

eJ = (T-l*(,))vJ

j = 1, 2, ..., m

(15)

which show that the jth columns of U and V depend only on the j t h static spectrum and concentration vector, respectively. For a given reaction there is usually a set of suspected absorbers. For example, in an enzyme-catalyzed reaction the suspected absorbers are any light-absorbing substrates, inhibitors, and enzymes that were mixed to initiate the reaction. Equations 15 are models against which spectral and kinetic information about suspected absorbers can be

Strategy for

Principal Component Analysis

The Journal of Physical Chemistty, Vol. 84, No. 20, 1980

tested. If a suspected absorber is one of the m linearly independent absorbers in the experiment, its static spectrum and concentration vector must satisfy eq 15. We now present least-squares equations based on eq 15 for determining whether a suspected absorber fits as one of the m linearly independent absorbers in the experiment. Suppose t4ere are proposed values for k wavelength channels of fj, with m 5 k 5 p. These proposed values may come from the measured static spectrum of a suspected absorber. Define tlhe proppsed static spectrum as the p component column vector fJprop, where the k channels for which there are proposed values contain those values, and where the remaining (p - h) wavelength channels contain to zeros. The least-squares loss function for using fIprop estimate u, in the first equation of eq 15 is given by eq 16, QLS

=

(fjprop

- L-l~l(m)U,)TWf(fjprop -- L-l@(m)~,)

(16)

Wf = diad W,, Wf2,-.,Wfp) where W, is unity if' there is a proposed value for channel i and is zero otherwise. The least-squares estimate of u, is the value for which QLsis minimized with respect to u and is given by eq 117. Then the M analysis estimate of QJLS

= p,l@(m)L-lwff,prop

(17)

we take $jLs to be the jth column of V. Two notes of caution are necessary regarding the correct interpretation of these least-squares criteria. First, the fit of a suspected absorber as one of the m linearly independent absorbers shows that the experiment can be interpreted by using the suspected absorber as one of the absorbers but does not prove that the suspected absorber is an absorber in the experiment. Secondly,if a suspected absorber does not fit as one of the m linearly independent absorbers, it may still be an absorber in the experiment if there are actually more than m absorbers and if the static spectrum or concentration vector of the suspected absorber is linearly dependent on the static spectra or concentration vectors of other absorbers. An example of the effect of linear dependence is treated in Appendix A. The problem of estimating F and C simplifies when the number of absorbers is small. In the next three sections we review the simplifications for one-, two-, and threeabsorber experiments. These simplifications apply also to any subspace of wavelength and time that contains only one, two, or three absorbers, even though the overall experiment contains more than three absorbers. One-Absorber Simplifications. The estimation of F and C in a one-absorber experiment is trivial. The matrices U and V reduce to scalars. Equations 15 and 13 become fi

f, on the assumption that the suspected absorber fits as one of the m absorbers in the experiment, is given by eq 18. f,LSM

= (L-'@(m))alLS

(18)

Note that whereas fIprop contains proposed values for contgins estimated only k wavelength channels, f values for all p channels. If fits f, to within random error at the wavelength channels &r which both contain values, the fiprop satisfies the first equation of eq 15, and we conclude that the suspected absorber fits as one of the m linearly independent absorbers in the experiment. Moreover, QILs is_thlentake? to be an estimate of the jth column of U. If f,L,SM andIf,,,,, do not fit each other to within random error, then f,prop does not satisfy the first equation of eq 15 rind we conclude that the suspected absorber does not fit as one of the m linearly independent absorbers. The second equation of eq 15 gives a similar leastsquares equation for using proposed values for k elements of 6 . to determine whether a suspected absorber fits as one of the m linearly independent absorbers in the experiment. Define the proposed concentration vector as 61prop, where the k times for which there are proposed values contain those values, and where the remaining ( N - k ) times contain zeros. The least-squares estimate of $, in the second equation of eq 15 is given by eq 19, where W,, is 9,LS

= Pc-l%tz)T-lWc~,prop

unity if there is a proposed value for time i and is zero otherwise. The M analysis estimate of 6j on the assumption that the suspected absorber fits as one of the m linearly independent absorbers in the experiment is given by eq 20. If ~ ~ L Sfits M 6jprop at the k times for which both

(20) contain values, we conclude that the suspected absorber fits as one of the m linearly independent absorbers, and = (T-l@(m))QjjLs

= (L-l4i)Uii

= (T-l+i)Uii

(21)

= Wl Since fl is an estimate of fl to within an arbitrary multiplicative constant al,we may take fl to be any multiple of L-141. Arbitrarily setting ull equal to one, we obtain from eq 21 UllUll

fl = L-141

irl = ( ~ / w ~ ) T - ~ + ~ (22) To determine a1 so that the molar absorptivities and concentrations of absorber one are known requires either the nonzero molarity of absorber one at any time during the experiment or the product of the nonzero molar absorptivity of absorber one at any wavelength channel times the absorbance-cell path length. Two-Absorber Simplifications. The simplifications for two-absorber experiments give the following three results: (1)When the shape of one absorber's static spectrum is known, the shape of the other absorber's concentration profile can be computed directly. (2) When the shape of one absorber's concentration profile is known, the shape of the other absorber's static spectrum can be computed directly. (3) Even when there is no outside information about the static spectra and concentration profiles, their upper and lower bounds can be computed directly from the M analysis eigenvectors and eigenvalues. In a twoabsorber experiment U and V are the (2 X 2) matrices

(19)

W, =: diag(W,,, W,, ,...,WCJ

6jLSM

2563

The following relationships between the vectors in U and V can be derived from eq 13: VI = [l/det(U)lQ&u2 v2 = -[l/det(U)lQ(2,Kul u1

= [l/det(V)lQ(,&v2

u2 = -[l/det(V)lQ(2)K~1

2504

The Journal of Physical Chemistry, Vol. 84, No. 20, 1980

TABLE I : Solution Bands for Normalized Static SDectra

Cochran and Horne

TABLE 11: Solution Bands for Normalized Concentration Vectors

P

a, =

P

Li-lCi,,

a*=

j= I

I=, a3=

c Li-l@i2

min ( C i i / C i z h

>0 a s = max($jz/$jl),

a4 =

min

(qjz/$jl)

@iz

as = miniCi,/Cizl Ci2