Matrix Rank Analysis of Spectral of Data. - Analytical Chemistry (ACS

Citation data is made available by participants in CrossRef's Cited-by Linking service. For a more ... Improving the reliability of factor analysis of...
0 downloads 10 Views 390KB Size
Matrix Rank Analysis of Spectral Data DEMETRIUS KATAKIS Nuclear Research Center Democritus, Aghia Paraskevi Aftikis, Athens, Greece ,A computer method is described for the determination of the rank of high order matrices. The method is suitable for the determination of the number of independent variablese.g., concentrations-in a complicated chemical system, under various conditions, by using the matrices of the appropriate parameters, The errors are discussed in some detail, and statistical criteria, which take these errors into account, are proposed for the determination of the rank of a matrix of experimental data. The method also makes possible the detection of a systematic error in the original data. A numerical example of analysis of spectrophotometric data on the Cr++naleic acid system at various times and wavelengths illustrates the use of the method, including the use of the proposed statistical criteria for the determination of the rank.

A

squares method in matrix form has been used by Sternberg, Stillo, and Schwendeman (4) for the analysis of ultraviolet spectrophotometric data of multicomponent systems. The rank of the matrix of the response of the fluorescence detector a t various excitation and emission wavelengths as a quantitative criterion for the number of independent fluorescent components, has been used by Weber (6). The rank of the matrix of absorbances at various wavelengths and concentrations has been used by Ainsworth (1) in the analysis of spectrophotometric data. The method of matrix rank analysis is general when collective properties of a multicomponent system are studied; it can be applied in various fields by choosing the appropriate parameters. The study of multicomponent systems usually involves the isolation and separate study of each or of a small number of components or variables, and subsequent logical treatment of the findings in order to reach conclusions for the system as a whole. This procedure has the disadvantage that it cannot always predict all the possible interactions in the total system. Detailed, simultaneous study of all the components or variables is not always possible, at least with the present experimental techniques. h computer method, however, for the determination of the rank of high order matrices makes practically possible the study of a t least some asLEAST

876

ANALYTICAL CHEMISTRY

pects of the total system. Thus, if in a complicated chemical mixture the study of the change in concentration of each component as a function of some variable-e.g., time, pH, etc.-is not convenient or possible, one can a t least use matrix rank analysis to obtain information about the change in the number of independent components as a function of these variables, and, therefore, have direct evidence about the equilibria established in the system, the mechanism of the reactions taking place, the formation of unknown products, etc. I n fact, the method is useful even for relatively simple systems if separation and identification of components present experimental difficulties. The determination of the rank of a matrix by computing the determinants derived from it becomes a cumbersome and practically impossible task for matrices of high order. In this paper a computer method for the determination of the rank is described, with special emphasis on its application in treating spectrophotometric data. However, the usefulness of the method is not limited to spectrophotometry. Thus, it can be applied in y-spectrometry on the matrix of the counting rate data a t various energies and times. This matrix is the row by column product of the matrix of the decay constants and the matrix of the number of nuclei of each species, its rank, therefore, gives the number of independent species. The method is particularly useful in yspectrometry if instrumental differentiation of the various activities or radiochemical separation is not easy. In spectrophotometric studies of multicomponent systems a generalized form of the Lambert-Beer's law can be used, represented by the matrix equation (1)

EC

=

A

where E is the matrix of the absorptivities of the components of the system a t

various wavelengths and various conditions-e.g., a t various times in the case of a reacting system-C is the corresponding concentration matrix, and A is the absorbance matrix. The rank of C is, in general, invariant under the transformation represented by the above equation, being therefore the same as the rank of A and it gives the number of linearly independent components of the system. In the use of rank for the enumeration of the independent components, components are counted only when their contribution to the magnitude of the matrix elements exceeds experimental error. METHOD OF CALCULATION AND OUTLINE OF COMPUTER PROGRAMMING

The method of computation of the rank of A is based on the Gauss process of elimination ( 3 ) . all

a21

*=\

ale ae?

. . . . . . aln . . . . . . a?,

\

. . anz

The program forthe machine runs as follows : Find the largest element of A and bring i t into leading position by interchanging the row and column where it belongs with the first row and column, respectively. If there are two or more equal figures that can be brought into leading position, take any one of them. The indices of the elements of A and the derived matrices refer to the arrangement after the performance of the interchange. Compute the simple product zlyl', where zl is the first column of A and yl' is the first row of A divided by the pivotal element all-that is, compute the matrix:

T h e computation need be done only from the second row and the second column on. Subtract 81 from the original matrix A-namely, subtract each element of SI from the corresponding element of A , thus obtaining Q1:

2 ~ ~ ~ ~ / ~where q ~ ~q l 1(( a - ~l ) - ~is ) the ~ , leading element of The-factor 2 arises from the fact that each element q , l ( 8 ) of from a subtraction of two przducts of the appropriate elements of Q.-l. The rounding off error, e,, is assumed t o be l / 2

-‘--I 004

3g 002 g

o

3-0.02

L

-004 “060

IO

Figlire

1.

matrix row

0 1

50

20 30 40 NX OF ELEMENT

Graphical

60

expansion of

(Table 111) taken row after

Total

Error. T h e total error, comes from a combination of the computational error and the propagated error of measurement. ;,-I,

During the computation of a row of Q1, take the row into the memory and transfer the old row of il to another region of the memory where the error is to be computed. This need be done only from the second row and column of Q1 on, the first row and column being zero. Other columns and rows that happen to be zero can also be omitted, provided that certain criteria refering to the error are met asdescribed below. Print the matrix &,-namely, the matrix of order n - 1 obtained from Q1 by omitting the zero first row and column. Calculate the error matrix for according to Equation 2, and p_int_it. Repeat the procedsre with Q1, Q2 . . . etc., until a matrix Q,, is obtained having all its elements absolutely less than the corresponding elements of the error matrix. Mark this matrix, but continue the calculation for two or three more steps for checking purposes. At every step of the above procedure the rank is diminished by one :

al,

Rank and

if

as

=

rank A - s

unit of the last decimal (third decimal in the numerical example given in this paper). Computational error can be avoided if more figures than the significant ones are used during the computation. Errors of Measurement. Denote by e l l ( . ) the error of the element q , j ( 8 ) in matrix Qd. e l j ( o ) refers to the original matrix ,4 and is a n assumed error of measurement-e.g., 2%. for s + 0 comes- from i h e propagation of error, from to Qb. The first row and column of QI are zero and their error is also zero, since they are formed by subtraction of identical elements. The only errors, therefore, that should be computed are the errors in the other rows and colum_ns of &,-the errors in the elements of Qa. The maximum of the absolute value of is given approximately by the equation : j-l(q)

a,,,vanishes (rank am = 0) Rank.4

= m

1 €,-I,

,-I(*)

j-l,(*)

A systematic error in the elements of the original matrix is also propagated during the process in a manner analogous to that with the random error, retaining, however, its nonrandomness. I n fact, the method provides a means of detecting such error in the original data. The row of the derived systematic error matrix is proportional to the difference of two rows in the previous matrix and therefore any periodicity in the rows is preserved. OTHER CRITERIA FOR VANISHING OF

v,

If the magnitude of the error is comparable to the magnitude of the elements themselves, the criterion that for a vanishing all its elements should be absolutely smaller than the corresponding elements of the error matrix is not

om

i=

The criterion for the vanishing of

a,,,is that all its elements are absolutely

smaller than the corresponding elements of the error matrix. In the numerical example given below more printings were done than indicated above, for illustrative purposes. ERRORS IN DETERMINATION

Two kinds of errors should be distinguished : computational errors arising from the rounding off of the figures to a fixed number of decimals, and errors of measurement because of which the elements of matrix A are known only within certain limits of accuracy. The two kinds of errors are added to give the total error. Computational Errors. Every element of Q. calculat_ed from the appropriate elements of has a computational error absolutely less than

The probability of the signs combining during the computation tto give the The computed maximum error is maximum error, therefore, becomes more and more improbable and the error found in the elements of Q6 diminishes as the computation proceeds, provided that the magnitude of the error is relatively small compared to the magnitude of the elements. If the two magnitudes become comparable, as very often happens in practice after the first step, Formula 1 no longer holds. I n this case a better approximation is obtained if the absolute value of a term

is added to the value given by Equation 1.

(1)

sufficient. Additional criteria that can be used in that case are the following: The numerical average of all minors of Qm should be zero. I n particular, the average element should be very close to zero, Of all minors, the minors of order 1 (the elements) have the largest population and are more convenient to use. The elements of vanishing should be randomly distributed around zero; nonrandomness should not be detectable. Tests for nonrandomness that can be applied include the use of charts, the use of the theory of runs, the mean sauare difference, and the serial correiation (2). The application of the first two tests in the determination of the rank of a matrix is illustrated in the numerical example.

a,.,

VOL. 37, NO. 7, JUNE 1965

a

877

Table 1.

Matrix A after Largest Element Has 1.205 1.222 1.247 1.259 0.632 0.638 0.644 0.628 0.798 0.808 0.818 0.787 0.728 0.739 0.743 0.715 0.454 0.462 0.470 0.449 0.586 0.590 0.595 0.580 1.190 1.198 1.205 1.180 0.400 0.403 0.408 0.393 0.675 0.683 0.659 0.668

1.280 0.665 0.839 0.762 0.481 0.612 1.240 0.421 0.697

Table II. 0.078 0.081 0.080 0.075 0.077 . ~ 0.090 0.074 0.078

0.081 0.085 0.084 0.078 n.081 . . 0.094 0.077 0.082

.

~

0.080 0.084 0.083 0.077 ~ ~

~

Error Matrix of

0.075 0.078 0.077 0.072 0.074 0.087 0.070 0.075

o.nm ~~~

0.093 0.076 0.081

Table 111. -0.010 -0.009 -0.003 -0.007 -0.006

-0.003 -0.003 0.001 -0.005 0.002 0.006 -0.002 0.003

0.002 - 0.003 -0.002 -0.004 0.004 0.013 - 0.003 0.003

0.077 0.081 0.080 0.074 0.077 .. . 0.090 0.073 0.078 ~

Matrix

-0.010 -0.007 -0.006 - 0.003 - 0.007 -0.015 - 0.006 -0.003

-0.010 -0.007 -0.004

- 2)

+

+ (rn

2

+ m(m + 1) (2m + 1) +

(n - 2 ) 2 (n - m)’ = nZm - nm(m

+ 1) +

6 Subtractions. The same as multiplications. Recordings on Machine. Omitting the recording of A itself: 2[(n - 1)*

+ (n - 212 +

.

+.

- 1 ) 2 (n - 2)2 (n - m)z = n2m -

.

nm(m

m(m

.

.

+

+ 1) +

+ 1) (2m + 1) 6

(Same as the number of multiplications) 878

ANALYTICAL CHEMISTRY

,

is

0.002

- 0.002 -0.005 -0.005 0.010 0.020 - 0.001 0.002

+

The hypothesis, therefore, of an extreme value can be discarded on the 0.05 significance level. The graphical expansion of the matrix Q1taken row after row is given in Figure 1. It is obvious from this figure that there is a small displacement of zero (also indicated by the average) and a periodicity, both indicating a systematic error in the original data. The medium for the first 60 elements of the matrix is -0.003 (coincides with the mean value). The number of runs above and below the medium is equal to 20. Within a 0.01 significance level the number of runs should be 21(2). The data, therefore, are significantly nonrandom a t the 0.01 level, probably because of a systematic error, as mentioned above. If the calculation is carried to more steps, the periodicity and the displacement of the zero are preserved. ACKNOWLEDGMENT

The author thanks Ch. Siadimas, National Statistics Service of Greece, for making the SPS program and performing the calculations on the computer. LITERATURE CITED

. +(n-m+1)= m(m 1) (2m 1) 6

+ 1) +

(Twice the number of multiplications plus the number of divisions) Recordings in Writing.

+

-0.000 - 0.003 -0,001 -0.001 -0.003 -0.008 -0.002 -0.003

+ ( n - mI21 +

. . . n+(n-l)+, n2m - nm(m

(sa

-0.004 - 0.004 -0.004 0.001 -0.007 -0.013 -0.004 -0.003

nmn

0.078 0.082 0.081 0.075 n.n78 0.091 0.074 0.079

The matrix A (Table I) used as a numerical example is composed of the absorbances of a Cr+2-maleic acid reaction mixture in 1X perchloric acid. The rows in matrix A refer to different wavelengths, the columns to different times after mixing. The example is intended as an illustration of the application of the method, rather than as a full demonstration of its potential uses. The Cr+2maleic acid system is relatively simple, but the separation of the components a t specified time intervals, in order to study the kinetics of the reaction, presents experimental difficulties. The

- 1)m

Multiplications.

+

0.103 0.086 0.091

0.074 0.077 0.076 0.070 n.07.1 0.086 0.069 0.074

NUMERICAL EXAMPLE

(n-m+l)=nm-

(n - 1 ) 2

0.090 0.094 0.093 0.087

To the above number, the number of operations for computing and recording the errors should be added.

The total number of _operations for the calculation of matrix Q,,, are : Divisions. (n

al

the significant ones. The elements of are absolutely much smaller than the corresponding elements of the error matrix (Table 11). The calculation therefore need not be carried further than The error matrix for &, was calculated on the assumption that the experimental error in the elements of A was 2%. The criterion that the average numerical value of the elements of should be close to zero is fulfilled. This average value is -0.003 (see, however, below). The tests for nonrandomness give the following results : The value of: Q1

GI

-0.008 -0.003 - 0.005 - 0.001 -0.006 -0.015 -0.006 -0.001

NUMBER OF OPERATIONS

n+(n-l)+

Been Brought into Leading Position 1.275 1.279 1.188 1.268 0.651 0.658 0.664 0.619 0.828 0.832 0.835 0.777 0.750 0.755 0.760 0.702 0.475 0.480 0.480 0.441 0.603 0.609 0.600 0.578 1.222 1.231 1.171 1.213 0.411 0.415 0.419 0.390 0.691 0.693 0.649 0.689

+

I+

(m,- 1)m nm 2

determination, of the rank of matrix A a t the various stages of the reaction, however, can give information about the number of species present and the relationships between their concentrations, which, combined with other observations, can be used to elucidate the mechanism of the reaction. The computation which was done on the computer of the National Statistics Service of Greece has been carried to more figures than

(1) Ainsworth, S., J. Phys. Chem. 65, 1968 (1961); 67, 1613 (1963). (2) Bennett, C. A , , Franklin, N . L., “Sta-

tistical Analysis in Chemistry and the Chemical Industry,” Wiley, London,

1963. (3) Bodewig,

E., “-Matrix Calculus,” North-Holland Publishing Co., Amsterdam, 1959. (4) Sternberg, J. C., Stillo, H. S., Schwendeman, R. H., ANAL.CHEM.32, 84 (1960). (5) Weber, G., Nature 190, 27 (1961).

RECEIVEDfor review October 26, 1964. Accepted March 3, 1965. Work done under the auspices of the Greek Atomic Energy Commission.