Chemometrics: Theory and Application

A best complete model, i.e., the best real solution, is generated in the combination step. ... vantage since the data store for most types of chemical...
0 downloads 0 Views 686KB Size
4 The Unique Role of Target-Transformation Factor

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 30, 2015 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0052.ch004

Analysis in the Chemometric Revolution DARRYL G. HOWERY Department of Chemistry, City University of New York, Brooklyn College, Brooklyn, NY 11210

A mathematical-analysis revolution of major import has been occuring in chemistry during the past decade. This symposium is a logical manifestation of the revolution. By adapting a battery of mathematical/statistical techniques to high speed computers, researchers in chemometrics can extract new and here­ -tofore unobtainable insights into large, multifactor data sets. Factor analysis, a major weapon of the revolution, is proving to be a versatile, general method for analyzing matrices of chem­ ical data. In particular, the target-transformation method of factor analysis (1,2), which enables one to test empirical and theoretical models, offers powerful and unique potentialities for obtaining partial and even complete solutions to many kinds of chemical problems. The main objective of this presentation is to summarize the distinctive attributes of target-trans­ formation factor analysis (TTFA). Factor analytical solutions are of a form nicely adapted to chemistry. A data point, dij, in a data matrix is expressed as a linear sum of factors, each factor being the product of a row­ -designee cofactor and a column-designee cofactor. Mathema­ tically, factor analytical solutions obey the equation: η

η (1)

where η i s the minimum number o f f a c t o r terms, m, t o adequately p r e d i c t the data, and r and c ^ j a r e the c o f a c t o r s f o r the 1 row designee and the i t h column designee, r e s p e c t i v e l y , assoc­ i a t e d with the mth f a c t o r . The c e n t r a l purpose o f a TTFA i s t o d e r i v e information about the two s e t s o f c o f a c t o r s not only i n an a b s t r a c t (mathematical) sense but a l s o i n a r e a l ( p h y s i c a l l y s i g n i f i c a n t ) sense. In matrix n o t a t i o n , the data matrix t h

i

m

[ D ]

=

[R ] [C1

where [RJ i s the row matrix c o n t a i n i n g a row f o r each row

73 In Chemometrics: Theory and Application; Kowalski, B.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

(2)

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 30, 2015 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0052.ch004

74

CHEMOMETRICS: THEORY AND APPLICATION

designee and a column f o r each c o f a c t o r , and [ C ] i s the column matrix having a column f o r each column designee and a row f o r each c o f a c t o r . S o l u t i o n s o f the type i n d i c a t e d by equations (1) and (2) are e s p e c i a l l y s u i t e d to s t u d i e s o f e n t i t y - e n t i t y data matrices. The a b s t r a c t f a c t o r s are r e l a t e d i n some manner to those r e a l f a c t o r terms which measurably i n f l u e n c e the data. Even more conveniently, each c o f a c t o r p a i r i n a given f a c t o r can be t r a n s formed v i a t a r g e t transformation to s p e c i f i c p r o p e r t i e s o f the row designees and the column designees which are r e s p o n s i b l e f o r the f a c t o r . In a s o l u t e - s o l v e n t problem, f o r example, the f a c t o r s correspond t o the important energies o f i n t e r a c t i o n and the c o f a c t o r s p i n p o i n t the nature o f the i n t e r a c t i o n s i n terms o f the p r o p e r t i e s o f the pure s o l u t e and the pure s o l v e n t . (In u s u a l chemical p a r t i c l e s such as molecules, ions and r a d i c a l s , but a l s o , e.g., b i o l o g i c a l s p e c i e s , persons, p o l i t i c a l groups and c e l e s t i a l bodies.) F a c t o r a n a l y t i c a l research can be based upon the t a r g e t transformation technique and/or upon what we term the a b s t r a c t f a c t o r a n a l y t i c a l approach. In a b s t r a c t ( t r a d i t i o n a l ) FA, abs t r a c t s o l u t i o n s are obtained under v a r i o u s mathematical cons t r a i n t s . The i n v e s t i g a t o r then t r i e s t o gain i n s i g h t by examining the c o e f f i c i e n t s i n the matrices generated i n the a b s t r a c t s o l u t i o n . A b s t r a c t FA, long used i n the p s y c h o l o g i c a l and s o c i a l s c i e n c e s , can be g a i n f u l l y a p p l i e d t o c e r t a i n types o f chemical problems ( 3 ^ ) · The t a r g e t - t r a n s f o r m a t i o n method o f Malinowski (1) opens new t e r r i t o r y by enabling the researcher to t e s t parameters o f the row and column designees o f the matrix. The TTFA extension, i n a l l o w i n g one the p o s s i b i l i t y of t r a n s forming from a b s t r a c t f a c t o r s t o r e a l f a c t o r s o f the designees, a l l e v i a t e s f o r the p h y s i c a l s c i e n t i s t a major weakness o f a b s t r a c t FA. The steps i n a complete TTFA: data p r e p a r a t i o n , reproduction, t a r g e t transformation, combination and p r e d i c t i o n , have been d i s c u s s e d elsewhere (5). The number o f f a c t o r s r e q u i r e d i n equation (1) can be estimated i n the s h o r t - c i r c u i t reproduction procedure using both experimental-error and t h e o r e t i c a l c r i t e r i a , as was d e f t l y explained i n the previous t a l k (6). The raodelt e s t i n g c a p a b i l i t y of the t a r g e t transformation step i s the heart of f a c t o r a n a l y s i s f o r the p h y s i c a l s c i e n t i s t . A best complete model, i . e . , the best r e a l s o l u t i o n , i s generated i n the combination step. TTFA has been thoroughly t e s t e d during the past s i x years. Howery {5) and Weiner (7) , i n recent reviews which complement each other, consider the philosophy, theory, procedures and a p p l i c a t i o n s o f TTFA. D e t a i l s o f the mathematical development are given i n the already c l a s s i c paper of Malinowski and coworkers (2). TTFA can be u t i l i z e d using a blend of t h e r o r e t i c a l and e m p i r i c a l i n s i g h t s . Important TTFA's based a t l e a s t i n p a r t

In Chemometrics: Theory and Application; Kowalski, B.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 30, 2015 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0052.ch004

4.

HOWERY

Target-Transformation Factor Analysis

75

on a t h e o r e t i c a l framework i n c l u d e the study o f s o l u t e - s o l v e n t i n t e r a c t i o n s i n f l u e n c i n g proton chemical s h i f t s C2,8), the v e r i f i c a t i o n o f gas chromatographic r e t e n t i o n mechanisms (9), and the e l u c i d a t i o n o f s o l u t e - s o l v e n t e f f e c t s on a c i d i t y constants (10). These papers i l l u s t r a t e the e x c e p t i o n a l power o f TTFA i f the f a c t o r a n a l y s t s t a r t s with some t h e o r e t i c a l h e l p . In such cases, in-depth fundamental s o l u t i o n s can be achieved. However, f o r most chemical problems/ t h e o r e t i c a l i n s i g h t i s minimal. Thus, the second way o f u s i n g TTFA, i n v o l v i n g a more e m p i r i c a l approach, has an even wider a p p l i c a b i l i t y i n chemistry. Examples o f e m p i r i c a l s o l u t i o n s i n c l u d e a d e t a i l e d study o f ether c o f a c t o r s (11) (one o f a s e r i e s o f researches on the s o l u t e c o f a c t o r s i n f l u e n c i n g r e t e n t i o n i n d i c e s ) , and an i n v e s t i g a t i o n o f solvent-metal e f f e c t s on p o l a r o g r a p h i c half-wave p o t e n t i a l s (12). These s t u d i e s show the p o t e n t i a l f o r using TTFA t o f u r n i s h u s e f u l e m p i r i c a l s o l u t i o n s i n f i e l d s devoid o f a t h e o r e t i c a l underpinning. Target Transformations Unique Features. The q u i t e unique a t t r i b u t e s o f the t a r g e t transformation procedure center around the model-testing and model-building c a p a b i l i t i e s o f TTFA. No other mathematical/ s t a t i s t i c a l method shows such promise f o r e x t r a c t i n g r e a l c o f a c t o r s and f o r developing complete s o l u t i o n s t o m u l t i f a c t o r problems. 1) P o t e n t i a l c o f a c t o r s are separated and t e s t e d mathemat i c a l l y r e g a r d l e s s o f t h e complexity o f the data space. Any parameter o f e i t h e r the row o r column designees can be i n v e s t i gated independently. S i n g l e terms i n a t h e o r e t i c a l o r e m p i r i c a l model can be t e s t e d one a t a time even though the other c o f a c t o r s i n the space are o p e r a t i v e , an unmatched accomplishment o f TTFA. Target transformation serves t o curve f i t v e c t o r s o f parameters i n a m u l t i f a c t o r space. 2) R e s t r i c t i o n s on the procedure are minimal. No knowledge o f the other c o f a c t o r s i s r e q u i r e d . One can s t a r t w i t h complete ignorance o f the nature o f the r e a l c o f a c t o r s , i n marked cont r a s t w i t h m u l t i p l e r e g r e s s i o n a n a l y s i s which i s a p p l i c a b l e only i f a complete model i s s p e c i f i e d . Furthermore, the r e a l v e c t o r s to be t e s t e d need n o t be complete, a tremendous p r a c t i c a l advantage s i n c e the data s t o r e f o r most types o f chemical informat i o n i s u s u a l l y incomplete. Missing o r u n c e r t a i n p o i n t s can be l e f t blank on a t e s t v e c t o r (a procedure termed " f r e e floating"). Such p o i n t s w i l l be p r e d i c t e d as a premium i n s u c c e s s f u l t a r g e t transformations. 3) The separation o f the f a c t o r a n a l y t i c a l s o l u t i o n i n t o two p a r t s as shown i n equation (2) enables one t o b u i l d up s o l u t i o n s f o r the two kinds o f designees independently. Even i f the problem i n terms o f one k i n d o f designee appears hopel e s s l y complex, i t i s s t i l l p o s s i b l e t o d e r i v e a s o l u t i o n f o r

In Chemometrics: Theory and Application; Kowalski, B.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

76

CHEMOMETRICS: THEORY AND APPLICATION

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 30, 2015 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0052.ch004

the other k i n d o f designee* Two complete s o l u t i o n s can be developed term by term and each p o s s i b l e r e a l s o l u t i o n i n v o l v i n g s e t s o f c o f a c t o r s can be t e s t e d i n the combination step* If a l l o f the f a c t o r s i n the a b s t r a c t s o l u t i o n are not spanned i n a given combination, the reproduction v i a combination w i l l be poor, i n d i c a t i n g the s e n s i t i v i t y and the n o n - f o r c e - f i t t i n g nature o f the step. Two Examples. Two t y p i c a l examples from r e c e n t research w i l l i l l u s t r a t e the scope o f the t a r g e t - t r a n s f o r m a t i o n approach f o r t e s t i n g parameters o f the designees. Computations were c a r r i e d out on an I.B.M. 370/168 d i g i t a l computer u s i n g a computer program i n FORTRAN IV which has evolved over the decade (13). V e c t o r s t o be t e s t e d can c o n t a i n data o f any type which the researcher t h i n k s might be i n d i c a t i v e o f the behavior o f the designees (and hence p o s s i b l y r e s p o n s i b l e f o r a c o f a c t o r ) . Both p h y s i c a l v e c t o r s ( i l l u s t r a t e d by example one) and s t r u c t u r a l v e c t o r s (exemplified by the second example) can be t e s t e d . D e s c r i p t o r s used i n p a t t e r n r e c o g n i t i o n s t u d i e s have much i n common with the t e s t v e c t o r s o f TTFA. The e s s e n t i a l question i n e v a l u a t i n g the success o f a t a r g e t transformation i s how w e l l does the b e s t - f i t p r e d i c t e d v e c t o r c a l c u l a t e d from the l e a s t - s q u a r e s method (2) agree p o i n t - b y - p o i n t with the r e a l vector being tested. I f the two v e c t o r s are reasonably s i m i l a r , the t e s t v e c t o r i s taken t o be a r e a l c o f a c t o r . The examples to be s i t e d i n v o l v e f o r pedagogical purposes a d i f f i c u l t - t o i n t e r p r e t r e s u l t and an u n s u c c e s s f u l transformation. The f i r s t example i s taken from a TTFA o f the r e t e n t i o n i n d i c e s o f o r g a n i c s o l u t e s on stationary-phase s o l v e n t s (14). S t u d i e s o f r e t e n t i o n i n d i c e s have amply demonstrated the a b i l i t y o f TTFA to i s o l a t e c o f a c t o r s i n problems f a r too complicated f o r d e t a i l e d t h e o r e t i c a l treatments. Whereas s o l u t e s have been s t u d i e d i n d e t a i l , t h i s i s the f i r s t in-depth i n v e s t i g a t i o n o f GLC s o l v e n t s u s i n g TTFA. To b e t t e r examine the c o f a c t o r s o f the s o l v e n t s , only monomeric s o l v e n t s were s e l e c t e d . (Previous TTFA's o f r e t e n t i o n i n d i c e s have i n v o l v e d r e l a t i v e l y complex, polymeric s o l v e n t s f o r which t e s t v e c t o r s are d i f f i c u l t t o formulate.) The s p e c i f i c parameter t e s t e d by t a r g e t t r a n s formation i n t h i s example i s the molar r e f r a c t i o n , a v e c t o r which has g e n e r a l l y t e s t e d w e l l as a s o l u t e c o f a c t o r . As shown i n Table I, agreement between the t e s t v e c t o r and the p r e d i c t e d v e c t o r i s o v e r a l l moderately good a t b e s t . Values f o r three d e l i b e r a t e l y f r e e - f l o a t e d p o i n t s are p r e d i c t e d reasonably w e l l . The molar r e f r a c t i o n may be a c o f a c t o r ; such b o r d e r l i n e conc l u s i o n s are common i n TTFA. The second example i s s e l e c t e d t o i l l u s t r a t e the manner i n which s t r u c t u r a l v e c t o r s based on chemical i n s i g h t can be employed t o t r a c k down c o f a c t o r s . Such v e c t o r s are e s p e c i a l l y u s e f u l f o r developing e m p i r i c a l s o l u t i o n s . The example i n v o l v e s bond d i s s o c i a t i o n energies f o r r a d i c a l - r a d i c a l bonds (15), a

In Chemometrics: Theory and Application; Kowalski, B.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

In Chemometrics: Theory and Application; Kowalski, B.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977. 101.0 98.6 138.2 133.8 39.0 110.3 105.9 132.0 125.4

(79.7) 96.6 107.7 126.3 37.7 134.7 113.6 126.3 139.3

bis(2-ethoxyethyl) phthalate dibutyltetrachloro phthalate di-2-ethylhexyl adipate di-2-ethylhexyl sebacate diglycerol diisodecyl phthalate d i o c t y l phthalate dioctyl' sebacate Flexol 8N8 Hallocomid Ml8 Hyprose SP80 isooctyldecyl adipate Quadrol sucrose acetate isobutyrate sucrose octaacetate ΤΜΡ tripelargonate t r i c r e s y l phosphate Zonyl Ε7

Solvent

methyl ethyl isopropyl t-butyl phenyl benzyl

Radical

0.00 .00 .00 .00 .00 .00

Test Vector

Test vector:

Data Matrix:

(140.8) 203.7 119.6 76.3 192.4 141.3 160.5 103.0 (134.8)

Test Vector

129.7 163.0 134.1 117.9 176.8 166.2 154.5 84.5 117.6

Predicted Vector

0.02 .04 .06 .06 .08 .04

Predicted Vector

fluorine chlorine bromine hydroxy methoxy amine

Radical

1.00 1.00 1.00 0.00 .00 .00

Test Vector

0.97 .34 .17 .27 .28 .47

Predicted Vector

bond dissociation energies involving 12 radicals with the same 12 r a d i c a l s , data taken from compilation of A. Zavitsas (17). halogen uniqueness, TT i n 3-factor space.

Table II - Target transformation of a structural test vector.

Predicted Vector

Test Vector

retention indices for 39 carbonyl-containing solutes on 18 monomeric stationary-phase solvents, data taken from reference 16. molar refraction estimated by summing s p e c i f i c refractions, ΤΤ i n 6-factor space, free-floated points shown i n parentheses.

Solvent

Test vector:

Data matrix:

Table I - Target transformation of a physical test vector.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 30, 2015 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0052.ch004

CHEMOMETRICS: THEORY AND APPLICATION

78

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 30, 2015 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0052.ch004

most b a s i c type o f chemical data. The t e s t vector shown i n Table I I i s designed t o a s c e r t a i n i f the group o f halogenr a d i c a l s (assigned t e s t values of "1") i s r e s p o n s i b l e f o r a unique c o f a c t o r , i . e . , a c o f a c t o r not e x h i b i t e d by the remaining group o f r a d i c a l s (given t e s t values o f "0"). Such c o f a c t o r s can be t e s t e d without knowing the t h e o r e t i c a l form o f the i n t e r ­ a c t i o n term. As can be seen i n Table I I , the t a r g e t transform­ a t i o n i s c l e a r l y unsuccessful; a unique halogen i n t e r a c t i o n i s not a c o f a c t o r . Ramifications o f TTFA Combination. Sets o f vectors can be u t i l i z e d simultan­ eously i n the combination step to f i n d the best e m p i r i c a l s o l u t i o n . T h i s step of FA i s s i m i l a r to m u l t i p l e r e g r e s s i o n a n a l y s i s i n t h a t complete models are u t i l i z e d , but d i f f e r e n t from r e g r e s s i o n a n a l y s i s i n t h a t FA does not p r e s c r i b e t h a t the model be f o r c e f i t t e d . I f any f a c t o r i s missing i n a proposed model, the s e n s i t i v e combination step w i l l l e a d to a poor r e ­ production. Even so, s o l u t i o n s having e r r o r s l e s s than twice experimental e r r o r have been developed from thorough f a c t o r analyses of r e t e n t i o n i n d i c e s using the e m p i r i c a l approach. For example, i n the problem r e f e r r e d t o i n the f i r s t TT example above, the best s o l u t i o n t o the solvent p a r t o f the complex problem gave an e r r o r o f 7.1 r . i . u n i t s ; the e m p i r i c a l model can p r e d i c t r . i . 's with an e r r o r o f about one percent. P r e d i c t i o n . The p r e d i c t i v e a b i l i t y o f TTFA has as yet r e ­ c e i v e d l i t t l e a t t e n t i o n . To i l l u s t r a t e the p o t e n t i a l o f t h i s step, consider the p r e d i c t i o n o f a new row o f data based on the best e m p i r i c a l s o l u t i o n obtained i n the combination step. To c a l c u l a t e a new data p o i n t a s s o c i a t e d with an added row designee, x, and a column designee from the o r i g i n a l data matrix, j , a modified form of equation (1) i s employed: η =

Σ m=l

r r

e

a

l ,

c

xm calc,mj

( 3 )

The row-designee c o f a c t o r s i n equation (3), r ^ are those key vectors from the best s o l u t i o n v i a combination, while the columndesignee c o f a c t o r s , c ^ , are c o e f f i c i e n t s i n the [ C ] matrix (which i s r e a d i l y c a l c u l a t e d u s i n g equation (2), given a s o l ­ u t i o n [ R ^ - L I ] and the data m a t r i x ) . To c a l c u l a t e the new datum, only the values o f the η r e a l c o f a c t o r s f o r the new de­ signee and the η c o e f f i c i e n t s i n the j t h row of the c a l c u l a t e d column matrix are r e q u i r e d . For example, i n the study o f the c o f a c t o r s o f ethers (11), a r e a l s o l u t i o n having the f o l l o w i n g s i x v e c t o r s : carbon number, t o t a l atom number, chain d i f f e r e n c e , r e a

c a

c

In Chemometrics: Theory and Application; Kowalski, B.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.

Downloaded by UNIV OF CALIFORNIA SAN DIEGO on November 30, 2015 | http://pubs.acs.org Publication Date: June 1, 1977 | doi: 10.1021/bk-1977-0052.ch004

4. HOWERY

Target-Transformation Factor Analysis

79

chain r a t i o , t o t a l atom number squared, and b o i l i n g p o i n t squared, produced an e r r o r o f only 5.4 r . i . u n i t s . The r e ­ t e n t i o n index o f t-butylmethyl ether (a s o l u t e n o t incorporated i n the o r i g i n a l matrix) on b u t y l t e t r a c h l o r o phthalate (a s o l v e n t i n the o r i g i n a l matrix) i s p r e d i c t e d using equation (3) t o be 613 r . i . u n i t s , n i c e l y i n agreement with the experimental value o f 609 ± 3 u n i t s . The mean e r r o r f o r the e n t i r e new row i n ­ v o l v i n g the new ether with t h e 25 o r i g i n a l s o l v e n t s i s 3.5 units. Such s a t i s f a c t o r y p r e d i c t i o n s i n d i c a t e t h a t the empir­ i c a l s o l u t i o n q u i t e adequately spans a l l o f the s o l u t e c o f a c t o r s . The examples p u t f o r t h during t h i s p r e s e n t a t i o n demon­ s t r a t e c l e a r l y t h a t the TTFA approach can be u t i l i z e d t o thoroughly c h a r a c t e r i z e a chemical data space. Target t r a n s ­ formation methodology seems d e s t i n e d t o p l a y a l e a d i n g and unique r o l e i n the chemometric r e v o l u t i o n . Literature Cited

1. Malinowski, E. R., Doctoral Dissertation, Stevens Inst. Technology, Hoboken, N. J., 1961. 2. Weiner, P. Η., Malinowski, E. R., and Levinstone, Α., J. Phys. Chem., (1970), 74, 4537. 3. Bulmer, J. T., and Shurvel, H. F., J. Phys. Chem., (1973), 77, 256. 4. Rozett, R. W., and Petersen, Ε. Μ., Anal. Chem., (1976), 48, 817. 5. Howery, D. G., Amer. Lab., (1976), 8(2), 14. 6. Malinowski, E. R., in "Chemometrics: Theory and Applications," B. R. Kowalski, Ed., A. C. S. Symposium Series, P. xxx, 1977. 7. Weiner, P. H., Chem. Tech., in press. 8. Weiner, P. Η., and Malinowski, E. R., J. Phys. Chem., (1971), 75, 3160. 9. Weiner, P. H., Liao, H. L., and Karger, B. L., Anal. Chem., (1974), 46, 2182. 10. Weiner, P. H. , J. Amer. Chem.Soc.,(1973),95,5845. 11. Selzer, R. Β., and Howery, D. G., J. Chromatogr., (1975), 115, 139. 12. Howery, D. G., Bull. Chem. Soc. Japan, (1972), 45, 2643. 13. Malinowski, E. R., Howery, D. G., Weiner, P. Η., Soroka, J. M., Funke, R. T., Selzer, R. Β., and Levinstone, Α., "FACTANAL - Target-Transformation Factor Analysis,." Program 320, Quant. Chem. Prog. Exch., Indiana Univ., Bloomington, Ind., 1976. 14. Soroka, J. Μ., and Howery, D. G., to be submitted. 15. Howery, D. G., to be submitted. 16. McReynolds, W. Ο., "Gas Chromatographic Retention Data," Preston Tech. Abstracts Co., Niles, 111., 1966. 17. Zavitsas, Α., Long Island Univ., Brooklyn, Ν. Υ., private communication.

In Chemometrics: Theory and Application; Kowalski, B.; ACS Symposium Series; American Chemical Society: Washington, DC, 1977.