Iterative target testing for calculation of missing data points - Analytical

Jun 1, 1988 - Citation data is made available by participants in CrossRef's Cited-by Linking service. For a more .... Advantages of target factor anal...
2 downloads 3 Views 365KB Size
Anal. Chem. 1900, 60,1154-1158

1154

Iterative Target Testing for Calculation of Missing Data Points Thomas H. Brayden,* Paul A. Poropatic, and Jill L. Watanabe' LTV Aircraft Products Group, 9314 West Jefferson, Dallas, Texas 75211

Target factor analyrls has been developed to permlt analysis of the p r o m of a data set. As long as the data set spans the factor space of a property, a mathematical target testlng procedure, such as Mallnowskl's free floatlng procedure, can be used to provlde predlctlon values for mlsslng data. We Introduce an alternate procedure for target testlng whlch employs an lteratlve technique. Thls lteratlve procedure ylelds predlctlons for polnts mlsslng In the target data. We also present mathematical prods that the tterated predlctbn value wlll converge and that the convergence value w l l be the true value corresponding to the data set. The results obtalned are In complete accord with those derlved by free floatlng.

thonormal vectors into a rectangular array produces the column matrix [C]. The eigenvalues make up the trace of a diagonal matrix [A] (2) [ZI = [DIT[U

(1)

[ZI = [ClT[Al[C1

(2)

The portion of [C] free of extractable error, [C'], corresponding to the significant eigenvalues is used as a coordinate system onto which [D]is mapped to yield the abstract row matrix [R*]. [A*] can be represented in terms of [R*](2) [R*] = [D][C*IT = [rLj]

(3)

[R*lTIR*l= ~ ~ ~ I ~ ~ * l T ~ T ~ ~ ~ I ~ ~ Factor analysis is a data handling technique that has proven useful in the extraction of information from complex data sets as well as in removal of error from data. The elimination of most random error is accomplished through the use of a least-squares technique where only the significant portion of the data is retained (1,Z). Identification of the real factors responsible for variations in the data and quantification of these facton is accomplished through the application of target testing procedures (2). Target testing utilizes a test vector of measured values that represent one or more of the real factors that contribute to the data. This vector is transformed into a prediction vector by obtaining a least-squares fit to the significant portion of the data ( 2 , 3 ) . We have developed a variation of the target testing procedure which involves iterative target testing. When iterative target testing is carried out, an arbitrary value, such as a zero, is entered in the test vector for the element corresponding to the unknown (1,4). The zero in the test vector is replaced by the preliminary prediction value, with the other elements remaining unchanged. The process is repeated until convergence to a constant prediction value is achieved. Herein, we provide proof that iteration does result in convergence and that the prediction value thus obtained is, in fact, the true vaue when allowance is made for error embedded in the data or test vector. Also, we show that iterative target testing provides the same prediction value independent of the magnitude of the value arbitrarily entered in the place of the unknown. An error-free geometric example is provided to demonstrate the technique, and the technique is also employed with a published real data set.

THEORY Factor analysis is an appropriate technique for the analysis of systems where a property can be be represented as a linear combination of measured variables (2). T h w would encompass many chemical and mathematical systems (4-8). A data matrix [D] is constructed with one dimension corresponding to different samples and the other to the condition employed to obtain the measured values comprising the matrix elements. [D] is transformed into a square covariance matrix [a,which can be decomposed into a set of orthonormal eigenvectors each with a corresponding eigenvalue. Tabulation of these or-

'Present address: Hoechst Celanese Corp., Somerville,N J

08876.

= [C*][D]T[D][C*]T = [C*][Z][C*]T = [A*]

(4)

Target testing, the calculation of a prediction value, is then carried out by employing the following equation where TV is the test vector and PV is the prediction vector (2) P V = [R*][A*]-l[R*]TTV

(5)

The product of matrices used to convert TV to PV can be represented as a generic matrix [PI as shown below. P V = [PITV

(6)

The vectors TV and PV will have a component that corresponds to each of the r rows in the data set. Employing the final row as the unknown and entering a zero for the test value, a prediction value is obtained that is biased by the entry of the zero.

(!'?=E

ii *

PV,

If') TV,,

H

(8)

PV, = (GTVI + ... + HTV,-l)

(9) Replacing TV, with PV, yields a new prediction value. Iterating in this manner produces an expression for PV, which can be represented as a series of a geometric form where m is the number of iterations (9) rn

PV, = C(GTV1 + ... + HTVr-l)I(i-l) i=l

(10)

The value of PV, will converge if the absolute value of I is less than one (9). In order to prove that this criterion is satisfied, we solve for I by carrying out an expansion of the row of [PI corresponding to the iterated value which in this case is the final row. The dot product of the final row with itself is equivalent to the diagonal element I as shown in Chart I. Rearranging the resulting equation and applying the

0003-2700/88/0360-1154$01.50/0 C 1988 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 60, NO. 11, JUNE 1, 1988

1155

quadratic formula provides the expression for I

+ ... + lP + P = I 1 f (1- 4(G2 + ... + P))'/' G2

I=

2

(11) (12)

Since all values G through H are derived from measured values, they are all real numbers. Likewise, I is real number restricting the range of ( G 2 ... + ZP) as indicated

+

0 I (G2

+ ... + P)I1 / 4

(13)

From eq 11,we see that I is positive and, if I exists and also corresponds to a significant eigenvalue, it must be greater than zero as must G through H if the unknown is truly a linear combination of the factors composing the system. The range of Z is thus defined by eq 12 and 13

0