Mass spectral pattern recognition via techniques of mathematical

Analysis of smoke aerosols from nonflaming combustion by pyrolysis/mass spectrometry pattern recognition. Rushung. Tsao and Kent J. Voorhees. Analytic...
0 downloads 0 Views 1MB Size
722

ANALYTICAL CHEMISTRY, VOL. 50, NO. 6, M A Y 1978

use of a reprogrammable memory chip permits modifications t o be simply made to further refine the control program or to provide an end-point titrator or pH-stat with different control programs. Third, the cost of this system, including meter and buret, is less than one third that of commercial units discussed earlier. The precision and reproducibility possible with this system encourage the future investigation of coulometric titrant generation and the incorporation of a high resolution electrometer in place of the pH-meter.

ACKNOWLEDGMENT T h e author gratefully acknowledges the efforts of R. Thompson in construction of the microcomputer and the technical assistance of J. Earl.

LITERATURE CITED (1) H. S. Rossotti. Talanta, 21, 809 (1974). (2) D. D. Perrin and I. G. Sayce, Chem. I n d . . 1966, 661.

0 . Ginstrup, Chem. Instrum., 4, 141 (1973). S . Gobom and J. Kovacs, Chem. Scr., 2, 103 (1972). T. G. Christiansen, J. E. Busch, and S. C. Krogh, Anal. Chem., 48, 1051 (1976). J. M. Ariano and W. F. Gutknecht, Anal. Chem., 48, 281 (1976). R. C. Bates, in "Determination of pH-Theory and Practice", 2nd ed., J. Wiley & Sons, New York, N.Y., 1973. W. A. E. McBryde, Analyst(London), 94, 337 (1969). D. J. Leggett and W. A. E. McBryde, Talanta, 21, 1005 (1974). I. G. Sayce, Talanta. 15, 1397 (1968). 0. L. Davies and P. L. Goldsmith in "Statistical Methods in Research and Production", Oliver and Boyd, Endinburgh, Scotland, 1972, Chap. 7. 0. Wahlberg and P. Ulmgren, Acta Chem. Scand., 21, 2759 (1967). D. J. Leggett and W. A. E. McBryde, Talanta, 22, 761 (1975). L. G. Silk3 and A. E. Martell. "Stability Constants of MetaCIon Complexes", 2nd ed., Chem. SOC. London, Spec. Pub/.,No. 17, 1964.

RECEIVED for review August 23, 1977. Accepted January 18, 1978. The author gratefully acknowledges financial support from the National Research Council of Canada (A0025) and the University of Toronto, Ontario, Canada.

Mass Spectral Pattern Recognition via Techniques of Mathematical Programming Donald W. Fausett" Department of Mathematics, Colorado School of Mines, Golden, Colorado 8040 1

James H. Weber Laramie Energy Research Center, Department of Energy, Laramie, Wyoming 8207 1

The problem considered is the determination by low-resolution mass spectrometry of the constituent compounds and their relative concentrations in a sample mixture of chemical compounds. The three methods presented for solution of this problem require that a reference set of mass spectra of pure compounds be available that contains the spectra of the dominant compounds present in the mixture. Each method involves the identification of a subset of the reference compounds whose spectra generate the best fit (in some sense) to the spectrum of the sample in comparison with all other possible subsets. The formulation of the three methods of solution and examples of results obtained by the use of each method are presented. Each of the methods of solution has certain advantages and disadvantages associated with it. Least Squares Approximation is the most widely known and used, and there is a large amount of statistical information available for it. MAD Approximation and Chebyshev Approximation are readily implemented by the use of the simplex algorithm from Linear Programming; this permits sensitivity analyses of the solutions from the final tableaux. The ultimate decision as to which method is preferable is left to the user.

Computer processing of spectral data to aid in the analysis of a chemical sample has been an area of active research for several years (1-38). Good reviews of much of this work from a pattern recognition point of view have been provided in (23, 34). Following Isenhour, Kowalski, and Jurs (23),we define 0003-2700/78/0350-0722$01.00/0

pattern recognition as applied to chemical problems to mean the use of empirically known relations among certain (reference) data t o deduce unknown relations among other (sample) data. We consider the following problem: Given the observed low-resolution mass spectrum of a chemical sample (in general, a mixture), how can one determine the identities and the relative concentrations of the compounds present in t h e sample? I t is desirable that this determination be made without recourse to any information requiring further laboratory analysis of the sample. An exact solution to this problem is impossible in practice because of errors in the measurements of peak intensities in mass spectra. We present three methods for obtaining approximate solutions by the mathematical construction of models of the chemical sample. Comparison of the spectra generated by the models with the observed spectrum of the sample provides the bases for evaluating the accuracy of the approximations. Although our discussion is in terms of low resolution mass spectra, the techniques presented are equally applicable t o spectral data from other sources. T h e construction of each model requires that a reference set of standard mass spectra of pure chemical compounds be available which contains the spectra of the dominant compounds present in the sample. The subset of the reference compounds whose spectra generate the best fit (according to one of three criteria) t o the sample spectrum in comparison with all other possible subsets is identified by a mathematical programming procedure; those compounds with relative concentrations as computed t o determine the best fit to the C 1978 American Chemical Society

ANALYTICAL CHEMISTRY, VOL. 50, NO. 6, M A Y 1978

sample spectrum constitute the model. The three criteria used to measure goodness of fit to the sample spectrum are: (1) minimization of the sum of the squares of the residuals (Least Squares Approximation); ( 2 ) minimization of the sum of the absolute values of the residuals (MAD Approximation); and (3) minimization of t h e largest absolute value among the residuals (Chebyshev Approximation). T h e formulation of t h e problem for each method of approximation is discussed, the method of solution is indicated, and examples of results obtained are presented.

LINEARITY PROPERTY OF T H E M A S S SPECTRUM An assumption that is fundamental to our methods of analysis is t h a t the mass spectrum of a sample behaves in a linear manner over a broad range of concentrations of its constituent compounds (39). T o illustrate this concept more precisely, and to facilitate the discussion of the approximation methods that is to follow, we introduce some conventions and notation. A reference set of pure chemical compounds is selected which we assume will include most, if not all, of the compounds likely to be present in any significant quantity in a given sample that is to be analyzed. These reference compounds are ordered so that we can specify each one by referring to an index number; Le., so that we may speak without ambiguity of the first reference compound, the second reference compound, and so forth. Let n denote the number of compounds in t h e reference set. The m/q’s are chosen which are to be used to represent the mass spectra of the reference compounds and the sample. Each m / q in t h e spectrum of a compound represents information that can be used to help identify the presence of that compound in a sample; therefore, we recommend that the entire spectrum be used. The spectrum of each reference compound, and of the sample, is normalized by scaling t h e spectrum so that the sum of the relative abundances is equal to some standard number. Any positive number can be used as the standard as long as all spectra are normalized relative to the same standard. Let rn denote the number of m/q’s to be used; let r,, denote the relative abundance of the ions a t t h e j t h m / q in the normalized standard spectrum of the ith reference compound for i = 1, 2, ..., n and j = 1, 2 , ..., m; and let sJ denote the relative abundance of the ions at the jth m / q in t h e normalized spectrum of the sample for j = 1, 2 , ..., m. If S is the standard number for normalizing, then we have that m

C ri, = S for

i = 1,2, ’ . * ,rz

J=l

:sj=s

These equations are not exact in practice due to errors in measuring peak intensities. T h e general statement of this system of equations is

where x , = 0 in case the ith compound is not present in the sample. Given that this property of linearity holds for the mass spectrum of a sample, we shall invoke some powerful techniques of mathematical programming for our analyses of the sample (40, 41).

MATHEMATICAL PROGRAMMING TECHNIQUES We shall utilize the mathematical programming techniques of linear programming and quadratic programming. Linear programming is probably the most popular branch of mathematical programming. I t treats the problem of maximizing (or minimizing) a linear function of the unknown variables subject to a set of linear constraints on those variables (42, 43). Most computer centers include linear programming routines as part of their user‘s libraries. Two of our approximation methods can be formulated as linear programming problems. Quadratic programming is a generalization of linear programming. It treats the problem of maximizing (or minimizing) a quadratic function of the unknown variables subject to a set of linear constraints (44). Many computer centers have quadratic programming routines available for their users. One of our approximation methods can be formulated as a quadratic programming problem. Good discussions of curve fitting applications of linear programming are contained in (45-49). A Fortran language computer code for solving both linear and quadratic programming problems is given in (SO), subject t o a minor typographical correction appearing in (51). The computer routine is based on Lemke‘s complementary algorithm (52, 53). A Fortran language code that is very efficient for obtaining solutions based on the MAD model appears in (54). LEAST SQUARES MODEL For this model, we seek to minimize the sum of the squares of the residuals. In mathematical programming terminology, we wish to m

minimize z = C

m

j=l

3c1

+ x * + x3 = 1.0

where 0 5 x, 5 1 for i = 1, 2, 3; and, by virtue of the linearity property, we also know that x l r l j + x2r2,+ . x g ~ 3 J= sJ for j = 1, 2, ..., m . This system of equations expresses the linearity property t h a t the contribution of each compound present to the j t h peak of the sample spectrum is t h e product of the relative concentration of that compound and the relative abundance of the ions a t the j t h m / q in the normalized standard spectrum of the compound, and that t h e relative abundance of the ions a t the j t h m / q in the sample spectrum is the sum of the contributions of the compounds present.

C 3c,rzj - q

n

subject to C x , = 1 I=

Now suppose t h a t a sample is composed of the first three of our reference compounds, and let x i denote the relative concentration of the ith compound for i = 1 , 2 , ..., n, then we know that

723

1

where x, 2 0 for i = 1, ..., n. This is almost in the form of a quadratic programming problem as it stands, the only modification required is to delete J 2 the objective function to obtain an the quantity Z J = i m ~ from equivalent quadratic programming problem:

n

subject to C xi

=

1

i= 1

where x, 2 0 for i = 1, ..., n. We use Ravindran’s computer code (50) to obtain approximate solutions to the sample analysis problem based on this model.

724

ANALYTICAL CHEMISTRY, VOL. 50, NO. 6, M A Y 1978

MINIMUM ABSOLUTE DEVIATION (MAD) MODEL This model is based on the minimization of the sum of the absolute values of t h e residuals; Le.,

minimize z

=

A

subject to n

C

xirij- A

< sj f o r j = 1, 2, ..., m

k1

n

C xirij - A n

subject to C x i = 1 iL=

1

for j = 1, 2 , ..., m. We can now write an equivalent problem of t h e desired form (46-48): m

C ( e j ++ e j - )

=

j= 1

subject to n

x i r i j- si - ej+ + ej- = 0 for j

C i=

=

1, 2,

. . ., m ,

1

n

cxi=1 i= 1

ej+2 0 for j = 1, 2, . . ., m ej- 2 0 for j = 1,2, . . ., m xi 2 0 for i = 1, 2, . . ., n We have written a Fortran IV language computer code that utilizes Phase I1 of the revised simplex method (42)to obtain approximate solutions to the sample analysis problem based on this model. An important observation is that, a t most, one term of each pair e,+ and e; can be greater than zero because the column vectors in the constraint matrix corresponding to each such pair are linearly dependent, and therefore both vectors cannot be in the basis simultaneously a t any stage of the simplex iteration. A more intuitive explanation is that the model cannot simultaneously overestimate and underestimate the value of the relative abundance of the ions a t the j t h m / q in the sample spectrum.

CHEBYSHEV MODEL This model is based on the minimization of the maximum absolute value among the residuals; i.e.,

minimize z

n

maximum

= 1

< j < ni I C xirij - s j I i= 1

n

subject to C xi = 1 i= 1

4 2 O f o r i = 1, 2,

..., n

This problem can be transformed into one of the desired form by introducing a new variable A and writing the new constraint conditions

1

5 airij- sj

1 i=l

for j = 1, 2,

. .. , m

n

where x, 1 0 for i = 1, ..., n. To bring this problem into the form of a linear programming problem, we introduce new variables e,+ (overestimation errors) and e; (underestimation errors), and we write the new constraint equations

minimize z

< -sj

i;ll

s, for some value of j, then the ith reference compound is eliminated from further consideration in the analysis. Next there is an investigat,ion to see whether there are any reference compounds whose presence in the sample is indicated initially. This is accomplished by checking whether there is some m / q where the sample relative abundance is greater (or less) than the relative abundances of all reference compounds except one; if so, then the exceptional reference compound is indicated to be present in the sample. A compound that is so indicated will be included in the solution set. Test sample 1 was analyzed using a reference set of 50 hydrocarbon compounds with elemental compositions ranging from C6H, to CiHIo. Analyses were made with the artificial sample spectrum as initially generated and with three perturbed artificial sample spectra. One compound, 3methylpentane with elemental composition of C6HI4,was removed initially from further consideration in all cases. No reference compounds were indicated init ially to be present in any case. The computed analyses using each of the computer programs are shown in Table I. Test sample 1 had eight hypothetically legitimate compounds. All three methods gave exact analyses based on the initial (unperturbed) sample spectrum. All analyses based on t h e perturbed sample spectra correctly indicated the presence of all hypothetically legitimate compounds. There is some variation in the analyses based on the perturbed spectra, however. For the *2% perturbed sample spectrum, the MAD analysis indicated the presence of two extraneous compounds.

ANALYTICAL CHEMISTRY, VOL. 50,

NO.6,

la

I

0 i

n

0

W N

s

mWm i m o

. .

t-

0

0

m

2 9 N i

0

. .

.. .. ..

:. :. . : 0 o

i

:

N

0

i

:8 a?

I

.

.

0

' I

cv

x

$1

m m

' 0 '

8

: 0 : 1

.. .. .. . . . G?

, . . I

.

.

. . .

i

:. :::. .

(or-

j

i o

28

-$

: : I

.

N

, I

. .

,

W d

: ? 9 .o 0

. .

.

.

0 0

0 : :

0

.

.

P In

m

r-

. . .0

m

z

. . . .. .. ..

.. .. .. ..

N

mar-

0 O

i 4

0

: :

rl ri

:z 8

.. .. .. . . .

i

m

i

0

si

N

P

0

0

m P

0

0

m 4

0

. . .

i N 0

ggg

., .. .. (o

ro

. . . . . .

x

rl

r-

I

.

.

N

ri

: 9 :

9

. o .

0

. . . .. .. ..

. . . .

. . . .. .. ..

.. .. , . ..

?

.

.

. . .

.. .. ..

.

0

8

2 . .

0

. . . . . .

. . . .

5a 3

-. . . 0:

. . . .. .. ..

Y

!2

.., ... ... .. .. .. . . .

a m

I

;o

.. .. .. . . .

m

n

m

:. x .:

O

m I

I

.I

. .

288

U

-

A

A

5

az

3

.

I

MAY 1978

725

726

ANALYTICAL CHEMISTRY, VOL. 50, NO 6, M A Y 1978

The Least Squares and Chebyshev analyses each indicated the presence of three extraneous compounds, two of which were the same as those in the MAD analysis. The third extraneous compound was not the same in both analyses. The largest error in the relative concentration values for legitimate compounds occurred in the Chebyshev analysis; the relative concentration of 3-methyl-1-pentene was computed to be 0.121 as opposed to the hypothetical value of 0.146 in the test sample composition. The largest error for extraneous compounds also occurred in the Chebyshev analysis; the relative concentration was computed to be 0.033. of 1-methyl-1-ethylcyclopropane None of the three methods computed a relative concentration for a n extraneous compound t h a t was greater than the computed relative concentration for a legitimate compound. For the 1 5 % perturbed sample spectrum, the Least Squares analysis indicated the presence of four extraneous compounds. T h e MAD and Chebyshev analyses each indicated the presence of five extraneous compounds. The same two extraneous compounds that were common to all three analyses based on the h2% perturbed sample spectrum were again indicated to be present by all three analyses for the h 5 R perturbed sample spectrum. No other extraneous compounds were common to all three analyses. The largest error for legitimate compounds occurred in the Chebyshev analysis; the relative concentration of 3-methyl-1-pentene was computed to be 0.064 as opposed to the hypothetical value of 0.146. The largest error for extraneous compounds also occurred in the Chebyshev analysis; the relative concentration of 1methyl-1-ethylcyclopropanewas computed to be 0.082. All three methods computed relative concentrations for that compound that were greater than the computed relative concentrations for some legitimate compounds. For the & l O % perturbed sample spectrum, the MAD analysis indicated the presence of one extraneous compound. The Least Squares and Chebyshev analyses each indicated the presence of the same five extraneous compounds. Those five compounds did not include the compound indicated by the MAD analysis or either of the two extraneous compounds that were common to all analyses based on the 1 2 % and the 1 5 % perturbed sample spectra. The largest error for legitimate compounds occurred in the Chebyshev analysis; the relative concentration of 2-methyl-1-pentene was computed to be 0.118 as opposed to the hypothetical value of 0.191. The largest error for extraneous compounds also occurred in the Chebyshev analysis; the relative concentration of ethylcyclobutane was computed to be 0.038. None of the three methods computed a relative concentration for an extraneous compound that was greater than the computed relative concentration for a legitimate compound. All three methods gave generally satisfactory analyses for this test case. Test sample 2 was analyzed using a reference set of 38 hydrocarbon compounds with elemental compositions ranging from C& to C6H12. Analyses were made with the artificial sample spectrum as initially generated and with three perturbed artificial sample spectra. No reference compounds were removed initially from further consideration in any case. Three compounds were indicated initially to be present in all cases: 2-hexyne (CGHIo), 2-methyl-2-pentene (CsHI2),and cyclohexane (C6HI2).The computed analyses are shown in Table 11. Test sample 2 had a more complex composition than test sample 1. There were 21 hypothetically legitimate compounds as compared with eight in the first test case. This added complexity led to less precision in the computed analyses. For the initial (unperturbed) sample spectrum, the MAD and Chebyshev analyses correctly identified the presence of all hypothetically legitimate compounds. Those two analyses

did not indicate the presence of any extraneous compounds. The Least Squares analysis failed to indicate the presence of two legitimate compounds. The largest such error was the omission of 4-methyl-trans-2-pentene with a hypothetical relative concentration of 0.027. The Least Squares analysis indicated the presence of one extraneous compound; the relative concentration of trans-3-hexene was computed to be 0.005. For the 1 2 % perturbed sample spectrum, the Chebyshev analysis failed to indicate the presence of four legitimate compounds. The Least Squares and MAD analyses each failed to indicate the presence of the same five legitimate compounds, four of which were those missing in the Chebyshev analysis. The largest error for legitimate compounds was the omission in all analyses of 2,3-dimethyl-l-butene with a hypothetical relative concentration of 0.081. The Chebyshev analysis indicated the presence of six extraneous compounds; the Least Squares analysis indicated seven; and the MAD analysis indicated ten. Six of the extraneous compounds were common to all analyses. The largest error for extraneous compounds occurred in the Chebyshev analysis; the relative concentration of 3-methyl-cis-2-pentene was computed to be 0.067. For the 1 5 % perturbed sample spectrum, the Chebyshev analysis failed to indicate the presence of five legitimate compounds; the Least Squares analysis failed to indicate six; and the MAD analysis failed to indicate seven. Four of the missing compounds were common to all three analyses. Those four compounds were the same as the four missing compounds that were common to all three analyses based on the 1 2 % perturbed sample spectrum. The largest error for legitimate compounds was again the omission in all analyses of 2,3dimethyl-1-butene with a hypothetical relative concentration of 0.081. The Chebyshev analysis indicated the presence of five extraneous compounds; the MAD analysis indicated six; and the Least Squares analysis indicated eight. Two of the extraneous compounds were common to all analyses. Those two compounds were among the extraneous compounds that were common to all three analyses based on the h2% perturbed sample spectrum. The largest error for extraneous compounds occurred in the Chebyshev analysis; the relative concentration of 3-methyl-cis-2-pentene was computed to be 0.092. For the 1 1 0 % perturbed sample spectrum, the Least Squares analysis failed to indicate the presence of four legitimate compounds; the MAD analysis failed to indicate five; and the Chebyshev analysis failed to indicate six. Four of the missing compounds were common to all three analyses. Two of those compounds were among the four missing compounds that were common t o all analyses based on the 1 2 % and the 1 5 % perturbed sample spectra. The largest error for legitimate compounds occurred in the Cheybshev analysis; it failed to indicate the presence of 1,1,2-trimethylcyclopropane with a hypothetical relative concentration of 0.068. All three analyses missed 1-hexene with a hypothetical relative concentration of 0.063. The Least Squares analysis indicated the presence of three extraneous compounds; the MAD and Chebyshev analyses each indicated four. Three of the extraneous compounds were common to all analyses. One of those three compounds was among the two extraneous compounds that were common to all analyses based on the 1 2 % and the 1 5 % perturbed sample spectra. The largest error for extraneous compounds occurred in the MAD analysis; the relative concentration of 3,3-dimethyl-l-butene was computed to be 0.058. Test sample 3 was analyzed using the same set of 38 reference compounds as test sample 2. Again analyses were made with the artificial sample spectrum as initially generated and

ANALYTICAL CHEMISTRY, VOL. 50,

NO. 6, MAY 1978

(0

W

8 a

*

W

ln N

W

0

0

8

8

ln

8

8 3.3

m

At-rlm

W

0,0,0 0, 0 0 0 0

8 8 8

wt-im

rl

0 N

e

t3

8

ri

w

8

0

W W

(0

t-

o m m

9

???:??

9

8

8 m

0

8 (0

ri

*t-m

* u?

8 8 8

d m

0 0 0 . 0 0

0

0 N

N W

rl

rl

8

t-

N

8

u?

W

9

0

0

m

8

8

9

0

8

8

0

e4

ri

m

ri

8 W

N

m

rl

rl

m

mm01(Dm

**w*o

W

99999 :

8 8 8

0 0 0 0 0 .

N

W ri

0 0

0 N

* u3

8 8

t-

t-t-mt-

8

0000 0000

N

m*Wm

: :

I

0

Q

$a

-i 0

d a Y

0

h

w-

5 G Y

E

ze e? 0.;eY'

N

3

h

2 8

Y

8

Y

c?

c'l

ze

e

c

2

u

Y

z

e

727

728

ANALYTICAL CHEMISTRY, VOL. 50, NO 6, M A Y 1978

with three perturbed artificial sample spectra. No reference compounds were removed initially from further consideration in any case. Three compounds were indicated initially to be present in all cases: 3-methylcyclopentene (CsH,,), 2-hexyne (CsHlo), and cyclohexane (CdH12).The computed analyses are shown in Table 111. Test sample 3 had 15 hypothetically legitimate compounds. All three methods gave exact analyses based on the initial (unperturbed) sample spectrum. For the * 2 % perturbed sample spectrum, all three analyses failed to indicate t h e presence of the same two legitimate compounds. T h e largest error for legitimate compounds was the omission of 3-methyl-trans-2-pmtene with a hypothetical relative concentration of 0.091. The MAD analyis indicated the presence of three extraneous compounds; the Least Squares analysis indicated four; and the Chebyshev analysis indicated six. Three of the extraneous compounds were common to all analyses. T h e largest error for extraneous compounds occurred in the Least Squares analysis; the relative concentration of 2-ethyl-1-butene was computed to be 0.092. For the h 5 7 ~perturbed sample spectrum, all three analyses failed to indicate t h e presence of the same two legitimate compounds. Those two compounds are the same compounds which were missed by all analyses based on the 1 2 % perturbed sample spectrum. T h e largest error for legitimate compounds occurred in the MAD analysis; t h e relative concentration of methylcyclopentane was computed to be 0.128 as opposed to the hypothetical value of 0.026 in the test sample composition. T h e Least Squares and MAD analyses each indicated the presence of one extraneous compound; the Chebyshev analysis indicated six. One extraneous compound was common to all three analyses. That compound was among t h e three extraneous compounds that were common to all analyses based on the *2% perturbed sample spectrum. The largest error for extraneous compounds occurred in the MAD analysis; t h e relative concentration of 2-ethyl-1-butene was computed t o be 0.129. For t h e %lo70 perturbed sample spectrum, all three analyses again failed to indicate the presence of the same two legitimate compounds that were missed by all analyses based on t h e h2% and t h e h5% perturbed sample spectra. The largest error for legitimate compounds occurred in t h e Chebyshev analysis; t h e relative concentration of 1methyl-1-ethylcyclopropane was computed to be 0.019 as opposed to the hypothetical value of 0.123. The MAD and Chebyshev analyses each indicated the presence of five extraneous compounds; the Least Squares analysis indicated six. T h e one extraneous compound that was common to all analyses based on the h 2 7 ~and the h5% perturbed sample spectra was not indicated in any of the analyses based on the f10% perturbed sample spectrum. T h e largest error for extraneous compounds occurred in the Chebyshev analysis; t h e relative concentration of 3-methyl-1-pentene was computed to be 0.066. For some purposes, a complete identification of individual compounds present in a sample is not necessary; e.g., for the purpose of determining characteristic fuel properties of a hydrocarbon sample, it suffices to identify the presence and relative concentrations of certain compound groups. This idea is illustrated in the next example. Any grouping of compounds has the effect of reducing the number of reference spectra to be considered in making a n analysis. Such a reduction affords obvious economies in the utilization of computer resources. Indeed, these economies may determine whether a n analysis is feasible in some instances. T h e initial spectrum for test sample 4 was generated using a reference set of 30 hydrocarbon compounds with elemental

compositions ranging from C6Hdto C7H8. The 80 compounds were then grouped in the manner shown in Table I\’. A composite spectrum for each group was constructed by defining the relative abundance a t each mass-to-charge ratio to be the arithmet,ic mean of the relative abundancies a t t h a t m l y of all compounds included in the group. This definition of the composite spectrum minimizes the sum of the squares of the differences between the relative abundances of the spectra of the individual rompounds in the group and the relative abundances of the cnmposite spectrum of the group. Analyses were made with the artificial sample spectrum as initially generated and with three perturbed artificial sample spectra. The aromatic group ICfiH,) was removed initially from further consideration in all cases. Three groups were indicated initially to be present in all cases: diolefin, branched paraffin, and aromatic (C7HB).The computed analyses are shown in Table V. For the initial (unperturbed) sample spectrum, the Least Squares and MAD analyses correctly indicated the presence of all five legitimate compound groups and no extraneous groups. The Chebyshev analysis failed to indicate the presence of one legitimate group and erroneously indicated the presence of one extraneous group. The largest error for legitimate compound groups occurred in the Chebyshev analysis due to the omission of the diolefin group with a hypothetical relative concentration of 0.131. The largest error for extraneous compound groups occurred in the Chebyshev analysis due to the inclusion of the olefin group with a computed relative concentration of 0.027. For the h27r perturbed sample spectrum, the Least Squares and MAD analyses indicated the presence of all legitimate compound groups. The Chebyshev analysis failed to indicate the presence of one legitimate compound group. The largest error for legitimate compound groups occurred in the Chebyshev analysis, again due to the omission of the diolefin group. The MAD analysis did not indicate the presence of any extraneous compound groups: the Least Squares and Chebyshev analyses each indicated the presence of the same extraneous compound group as was indicated by the Chebyshev analysis based on the initial sample spectrum. T h e largest error for extraneous compound groups occurred in the Chebyshev analysis due to the inclusion of the olefin group with a computed relative concentration of 0.037. For the h570 perturbed sample spectrum, the Least Squares and MAD analyses indicated the presence of all legitimate compound groups. The Chebyshev analysis failed to indicate the presence of one legitimate compound group. The largest error for legitimate compound groups occurred in the Chebyshev analysis, due to the omission of the diolefin group. The MAD analysis did not indicate the presence of any extraneous compound groups; the Least Squares and Chebyshev analyses each indicated the presence of the same extraneous compound group as was indicated by those analyses based on the f 2 7 0 perturbed sample spectrum. The largest error for extraneous compound groups occurred in the Chehyshev analysis due t o the inclusion of the olefin group with a computed relative concentration of 0.051. For the & 1 0 7 ~perturbed sample spectrum, the Least Squares and MAD analyses indicated the presence of all legitimate compound groups. The Chebyshev analysis again failed to indicate the presence of one legitimate compound group. T h e largest error for legitimate compound groups occurred in the Chebyshev analysis. due to the omission of the diolefin group. The Least Squares and MAD analyses did not indicate the presence of any extraneous compound groups; the Chebyshev analysis indicated the presence of the same extraneous compound as was indicated by that analysis based on the initial, the h 2 % , and the f5(7c perturbed sample

ANALYTICAL CHEMISTRY, VOL. 50, NO. 6, MAY 1978

riwmmt-

m(oe'9y-

-.????

. . . . . . . . . .

:

. . . . .

0 0 0 0 0 ,

e

F

m

2

0

0

0

-

0

m

0

mwwmL? et-mmO)

m

0 0

ri

: 0 0 0 0 0 .

riri

: 9: : :??:: . O .

??-'.??

.

0 P

9 0

0

.-5 Y

2

3c.l

Z4 T '

. . . 0

00.0

OoO

0 0

88 8 8

8

co

Lc

000

r-wm

m LD

N

0 0 0

000

t0 0

-

0

d

0 0 2

m

0

09 0 0

0

(D

or-

0

m

0

0

r-m

r-m

00 0 0

W 0

0?1??:

: : :

0 0 0 0 0 .

,

X

.

.

L1

m

riN

0

88.81 A

.oo..

Q; 0 rip'

r0

o m

rie

0 0 : 0 : : 00: 0 0 . 0 . . 0 0 ,

1z:z:: . . . m

o

(0

N

:1::: l l

. 0 . .

.

d

z

:. 0 o

m t0 0

. .

.. .. .. .. .. .. . . . . . .

.. ..

:. g. : .g :. :

e

s1

mA(o(sm mmmm(o

00100 : 0 0 0 0 0 ,

.. .. .. .. ..

r i c o N

O

at-

no

:. 0 0 0 0 .: ,:

729

730

ANALYTICAL CHEMISTRY, VOL. 50, NO. 6, MAY 1978

Table IV. Compound Groups for Test Sample 4 Group Compounds included in group classification Aromatic (C,H,) Benzene Aromatic (CiH,) Methylbenzene Cyclic diolefin 1,3-Cyclohexadiene 1,5-Hexadiene Diolefin 2,3-Dimethyl-l,3-butadiene 2-Methyl-1,3-pentadiene Cyclohexene Cyclic olefin 1-Methylcyclopentene 3-Methylcyclopentene cis-2-Hexene 2-Methyl-2-pentene 2,3-Dimethyl-2-butene 4-Methyl-cis-2-pentene cis-3-Hexene 3,3-Dimethyl-1-butene

a

4-Methyl-trans-2-pentene

Olefin

Cyclic alkane Normal paraffin Branched paraffin

3-Methyl-cis-2-pentene 2,3-Dimethyl-1-butene 2-Methyl-1-pentene 3-Methyl-1-pentene 2-Ethyl-1-butene 4-Methyl- 1-pentane 1-Hexene Cyclohexane Methylcyclopentane n-Hexane 2-Methylpentane 3-Methylpentane 2,2-Dimethylbutane 2.3-Dimethvlbu tane

spectra. The largest error for extraneous compound groups occurred in the Chebyshev analysis due to the inclusion of the olefin group with a computed relative concentration of 0.024.

CONCLUSIONS All three methods of approximation have been used to analyze laboratory samples a t the Laramie Energy Research Center. The computed analyses have been in good agreement with the known qualitative and quantitative characteristics of those samples. This experimental verification provides additional support for the validity of the three methods. Another approach to a Least Squares Approximation model has been.presented in (I). That approach is based on a statistical procedure known as multiple regression analysis (56). T h e Stepwise Regression Method of ( I ) has two disadvantages relative to the mathematical programming approaches which lead to the three methods of approximation. In the Stepwise Regression Method, it is possible for compounds to have negative concentrations in a computed analysis. When such a situation occurs, it is necessary to remove the compounds with negative concentrations and perform another stepwise regression. Of course, it is possible that negative concentrations may occur for other compounds in the new analysis. It is impossible for any compound to have a negative concentration in an analysis computed by any of the three mathematical programming methods. The Stepwise Regression Method is computationally more cumbersome than the mathematical programming methods that the three approximation models are based on. At each step of the regression method, a compound is selected to be added to the group of compounds that constitute the qualitative model of the sample. The compound to be added is the one that gives the greatest reduction in the variance of the f i t of the model spectrum to the sample spectrum. This necessitates the calculation of the variance of the fit for every compound being considered for addition to the qualitative model group at every step of the regression procedure. At each

a

ANALYTICAL CHEMISTRY, VOL. 50, NO. 6, M A Y 1978

step of any one of the mathematical programming methods, the compound t o enter the qualitative model and the compound to leave the qualitative model are determined by a simplex pivot rule. There is only one calculation of the corresponding new tableau at each step. Each of the mathematical programming methods represents a unified approach to the approximation problem in the sense that all alternatives are considered simultaneously at each step, whereas the multiple regression approach requires individual consideration of each alternative at each step. Algorithms for the computer implementation of the mathematical programming techniques are readily accessible in the literature (50, 51, 5 4 ) . Most computer centers have mathematical programming routines available in their user's libraries. Each of the three methods of approximation presented has certain advantages and disadvantages associated with it. Least Squares approximation is the most widely known and used, and there is a large amount of statistical information available for it. MAD approximation is readily implemented by the use of the simplex algorithm from Linear Programming; this permits a sensitivity analysis of the solution from the final tableau. By a simple modification of the objective function, it is possible to weight overestimation errors differently from underestimation errors. Chebyshev approximation is also readily implemented by use of the simplex algorithm, and therefore it also permits a sensitivity analysis from the final tableau. T h e ultimate decision as to which method is preferable rests with the user. An important consideration is the use that will be made of the computed analysis. Under certain circumstances, each method may be "best". Although Least Squares methods have dominated the statistical literature on regression methods for many years, there has been a recent increase in interest in alternative methods (57-60). This increase in interest is due to a growing awareness of the problems that can arise with a naive application of Least Squares. An important class of alternative methods is known as robust regression. Robust regression emphasizes methods that are not sensitive to deviations from normal distributions. Robust regression methods also reduce the effects of outliers (very bad data points) in data. The MAD method of approximation is a robust method of regression. General discussions of robustness are given in (61, 62).

LITERATURE CITED (1) (2) (3) (4) (5) (6) (7)

D. D. Tunnicliff and P. A. Wadswwth, Anal. Chem., 37, 1082-1085 (1965). J. I. Brauman, Anal. Chem., 38, 607-610 (1966). B. Pettersson and R. Ryhage. Anal. Chem., 39, 790-793 (1967). J. M. Ruth, Anal. Chem., 40, 747-750 (1968). L. R. Crawford and J. D. Morrison, Anal. Chem., 40, 1464-1469 (1968). L. R. Crawford and J. D. Morrison, AnalChem., 40, 1469-1474 (1968). D. D. Tunnicliff and P. A. Wadswotth, Anal. Chem., 40, 1826-1833 (1968). (8) P. C. Jurs, B. R. Kowalski, and T. L. Isenhour, Anal. Chem., 41, 21-27 (1969). (9) P. C. Jurs, B. R. Kowaiski, and T. L. Isenhour, Anal. Chem., 41,690-695 (1969). (10) L. R. Crawford and J. D. Morrison, Anal. Chem., 41, 994-998 (1969). (1 1) 8. R. Kowaiski, P. C. Jurs, and T. L. Isenhour, Anal. Chem., 41, 1945-1953 (1969). (12) P. C. Jurs, Anal. Chem., 43, 22-26 (1971). (13) P. C. Jurs, Anal. Chem., 43, 364-367 (1971).

731

(14) L. B. Sybrandt and S. P. Perone, Anal. Chem., 43, 382-388 (1971). (15) H. S. Hertz, R. A. Hites, and K . Biemann, Anal. Chem., 43, 681-691 (1971). (16) T. L. Isenhour and P. C. Jurs, Anal. Chem., 43 (lo), 20A-35A (1971). (17) S. L. Grotch, Anal. Chem., 43, 1362-1370 (1971). (18) D. H. Smith and G. Eglinton, Nature (London), 235, 325-328 (1972). (19) D. D. Tunnicliff and P. A. Wadsworth, Anal. Chem., 45, 12-20 (1973). (20) B. R. Kowalski and C. F. Bender, Anal. Chem., 45, 2234-2239 (1973). (21) T. L. Isenhour and P. C. Jurs, "Learning Machines", in "Computer Fundamentals for Chemists", Voi. 1, J. S.Mattson, H. B. Mark, Jr., and H. C. MacDonald, Jr., Ed., Marcel Dekker, Inc., New York, N.Y., 1973. (22) B. R. Kowalski, "Pattern Recognition in Chemical Research", in "Computers in Chemical and Biochemical Research", Vol. 2, C. E. Klopfenstein and C. L. Wilkins, Ed., Academic Press, New York, N.Y., 1974. (23) T. L. Isenhour, B. R. Kowalski, and P. C. Jurs, Crlt. Rev. Anal. Chem., 4, 1-44 (1974). (24) J. B. Justice and T. L. Isenhour, Anal. Chem., 46, 223-226 (1974). (25) H. A. Clark and P. C,: Jurs, Anal. Chem., 47, 374-378 (1975). (26) N. A. B. Gray and T. 0. Gronneberg, Anal Chem., 47, 419-424 (1975) (27) T. R. Brunner, C. L. Wilkins, R. C. Williams, and P. J. McCombine, Anal. Chem., 47, 662-665 (1975). (28) G. S.Zander, A. J. Stuper, and P. C. Jurs, Anal. Chem., 47, 1085-1093 (1975). (29) G. S. Zander and P. C. Jurs, Anal. Chem., 47, 1562-1573 (1975). (30) D. L. Duewer, B. R. Kowalski, and T. F. Schatzki, Anal. Chem., 47, 1573-1583 (1975). (31) H. Abe and P. C. Jurs, Anal. Chem., 47, 1829-1846 (1975). (32) C. L. Wilkins and T. L. Isenhour, Anal. Chem., 47, 1849-1851 (1975). (33) G. L. Ritter, S.R. Lowry, C. L. Wilkins, and T. L. Isenhour, Anal. Chem., 47, 1951-1956 (1975). (34) B. R. Kowalski, Anal. Chem., 47, 1152A-1182A (1975). (35) J. B. Justice, Jr.. and T. L. Isenhour, Anal. Chem., 47, 2286-2288 (1975). (36) P. C. Jurs and T. L. Isenhour, "Chemical Applications of Pattern Recognition", Wiley-Interscience, New York, N.Y., 1975. (37) G. L. Ritter, S. R. Lowry T. L. Isenhour, and C. L. Wilkins, Anal. Chem., 48, 591-595 (1975). (38) T. R. Brunner, C. L. Wilkins, T. F. Lam, L. J. Sokzberg, and S. L. Kaberiine, Anal. Chem., 48, 1146-1150 (1976). (39) R. B. LeBlanc, "Mass Spectrometry: Analytical Chemical Applications", in "The Encyclopedia of Spectroscopy", G. L. Cbrk, Ed., Reinhold Publishing Corp., New York, N.Y., 1960. (40) S. N. Deming and S. L. Morgan, Anal. Chem., 45, 278A-283A (1973). (41) D. L. Massart and L. Kaufman, Anal. Chem., 47, 1244A-1257A (1975). (42) G. Hadiey, "Linear Programming", Addison-Wesley Publishing CO., Reading, Mass., 1962. (43) D. G. Luenberger, "Introduction to Linear and Nonlinear Programming", Addison-Wesley Publishing Co., Reading, Mass., 1973. (44) G. Hadley, "Nonlinear and Dynamic Programming", Addison-Wesley Publishing Co.. Reading, Mass., 1964. (45) J. E. Kelley, Jr., J . SOC. Ind. Appl. Math., 6, 15-22 (1958). (46) H. Swanson and R. E. D. Woolsey, to appear in the Association for Computing Machinery (ACM) Special Interest Group Mathematical Programming (SIG-MAP) Newsletter. (47) I.Barrodaie and A. Young, Numerische Mathematik, 8, 295-306 (1966). (48) I. Barrodaie and F. D. K. Roberts, SIAM J . Numer. Anal., 10, 839-848 (1973). (49) N. N. Abdelmalek, BIT, 15, 117-129 (1975). (50) A . Ravindran, Commun. ACM, 15, 818-820 (1972). (51) A. Ravindran, Commun. ACM, 17, 157 (1974). (52) A. Ravindran, Opsearch, 7, 241-262 (1970) (53) C. E. Lemke, Manage. Sci., 11, 681-689 (1965). (54) I. Barrodale and F. D. K. Roberts, Commun. ACM, 17, 319-320 (1974). 155) E. Stenhaoen. S. Abrahamsson. and F. W. McLaffertv. Ed.. "Atlas of Mass Spectral data", Vol. 1 and2, Wiley-Interscience,~NewYork, N.Y, 1969. (56) M. A. Efroymson, "Mumpie Regression Anabsis", in "Mathematical Methods for Digital Computers", Vol. 1, A. Ralston and H. S.Wilf, Ed., John Wiiey and Sons, New York. N.Y., 1960. (57) P. Bloomfield and G. S. Watson, Biometrika, 62, 121-128 (1975). (58) H. L. Harter, Int. Stat. Rev., 43, 269-278 (1975). (59) R. W. Hill and P. W. Holland, J , Am. Stat. Assoc., 72, NO. 360, 828-833 (1977). (60) J. F. Claerbout and F. Muir, Geophysics. 38, 826-844 (1973). (61) Peter J. Huber, Ann. of Math. Stat., 43, 1041-1067 (1972). (62) Frank R. Hampel, Z. Wahrscheinlichkeitstheorle, 27, 87-104 (1973).

-

RECEIVED for review January 21,1975. Accepted January 26, 1978.