Resolution of Overlapping Absorption Bands by Least Squares

lose samples used in this study. The results obtained in this investiga- tion show7 that the C-2 position of the anhydroglucose unit of amylose is by ...
0 downloads 0 Views 463KB Size
portion of the hydrolyzate t,han in the 0.52-M.S. sample. A number of new unidentified peaks have also appeared. A sample of commercially available hydroxyethyl cellulose treated in a manner similar to the hydroxyethyl amylose samples revealed the presence of a minimum of 20 components. The BO-hydroxyethyl derivative was prese n t in the largest quantity of the substituted D-glucose which agrees with previous workers on hydroxyethyl cellulose ( 2 , 4). The large number of products is probably t’he result of the heterogeneous reaction c o n d i h n s used in the productmionof such derivatives as opposed to the homogeneous conditions used to prepare the hydroxyethyl amylose samplw used in t’hie study. The result,s obtained in this investigation shaw that the (3-2 posit,ion of the anhydroglucose unit, of amylose is by far the more react,ive during hydrosyet’hylation in cont,rast to the findings of Brownell and Purves ($3) and Croon and Lindberg (4) that the (3-6 position is t’he

more react,ive in the hydroxyethylation of cellulose. Our data are in agreement with the findings reported by Husemann and Kafka (6) for hydroxyethyl amylose prepared under different conditions. Our findings show the great potential of GLC in the study of the distribution of substituents in modified polysaccharides due to its speed, resolution, and ease of quantitation. ACKNOWLEDGMENT

Thanks are extended to R. Ray Estes for preparation of the hydroxyethyl amylose samples used in this investigation. The support and deep interest of and the many helpful discussions with E. E . Fisher are gratef ully acknowledged, LITERATURE CITED

(1) Brobst, K. M., Lott, C. E., Cereal Chem. 43, 35 (1966). (2) Brownell, H. H., Purves, C. B., Can. J . Chem. 35, 677 (1957).

(3) Cohen, S. G., Haas, H. C., J. Am. Chem. SOC.72. 3954 11950). (4) Croon, I., ’ Lindberg, ‘ B., Svensk. Pa perstidn 59, 794 (1956). (5) ermst,ad, E. T., in “Industrial Gums,” R. L. Whistler, ed., D. 727. Academic Press, New York, 1959. (6) Husemann, E., Kafka. M.. Makromol. Chem. 41, 208 (1960) (7) Shvluk. W. P.. Timell. T E ., Cnn .TChek. 34, 571 (1956). (8) Ibid., p. 575. (9) A. E. Stalev Manufacturing Co., Brit. Patent 978,495 (Dec. 23, 1964). (10) Sweeley, C. C., Bentley, R., Makita, M., Wells, W. W., J . Am. Chem. floc. 85,2497 (1963). (11) Tasker, C. W., Purves, C. B., Ibid., 71, 1023 (1949).

I4

-

1

-

CLARENCE E. LOTT,JR. KENNETH M. BROBST Research Center A. E. Staley Manufacturing Co. Decatur, Ill. 62525 Presented in part at Division of Carbohydrate Chemistry and Division of Cellulose, Wood, and Fiber Chemistry, Winter Meeting, ACS, Phoenix, Aria., January 1966. The authors thank the A. E. Staley Manufacturing Co. for permission to publish this material.

Resolution of Overlapping Absorption Bands by Least Squares Procedures SIR: With the increasing use of absorption methods in analytical techniques, the experimental data are frequently in the form of a series of measurement,s of a property Y, for example, absorbance, a t different values of some variable X , for example wavelength. I n many cases E’ has the form of a series of bands, and the heights of the band maxima, the values of X at these maxima, and the areas under t,he bands must be estimated. When overlap between neighboring bands occurs, the estimation of these quantities becomes very difficult and may be a source of considerable error (1: 10). Various approximate, and t’o a certain ext,ent arbitrary, procedures for dealing with this problem have been suggested ( 9 , 11) b u t they are unsat,isfactory when overlap is considerable or when the band width at half height is variable. Some use has been made of least squares optimization procedures in certain aspects of the processing of data of this type ( 2 , 5, 5, 7 , 8),but the general availability of high-speed digital computers invites the possibility of performing routine separations of overlapping bands by this method. The present, communication describes the way in which this may be accomplished and includes examples of its application t80specific analytical problems. LEAST SQUARES PROCEDURE

I n general, the experimental data consist of n observat,ions.

1770

ANALYTICAL CHEMISTRY

Y1, . Y t , .Yn,where Y,= Y ( X i ) , ,

,

which are to be fitted to a function

F ( X , Pi,

,

, ,

. , , . ,, P m )

(1)

where P I , .. . . . , P , are m, parameters defining the component bands and a base line. The best, fit is obtained when the parameters are chosen so as to minimize the funct’ion

s=

c TV,(F, -

proximate values of the parameters, PI’, . . . . . , PIrn. The condition that S in Equation 2 is a minimum requires 8s = 0 and together with Equation 4 a set of m simultaneous equations is thus obtained for the required parameter adjustments

n

(2)

Yi)2

i=l

(k = 1, . . . . . , m) (5)

where W i is a weighting function and

Fi

=

F ( X i , P I , . . . . ., P m )

(3)

In general F is not a linear function of the m parameters and the optimum values of P I , .. . . . . , P , cannot readily be calculat,ed from Equation 2. HOWever, if approximat,e values P’1, . . . . . . , P’, of the parameters can be guessed, F , can be taken as a linear function of the adjustments, A P j = Pj-P’, ( j = l , .. . , , m ) , required to optimize the parameters by using Taylor’s expansion and neglecting second and higher order terms. This gives

FORM OF F ( X , PI,

....,Pm)

The function F in Equation 1 may conveniently be expressed in the form N

F=CA,+B

(6)

z=1

where AI, . . . . . , A N are functions each of which generates a single band and B generates a base line. I n many cases experimental data can be fitted closely by a series of Gaussian functions, in which case

A,

=

P3,--2 exp{-ln 2 [ 2 ( X

-

P3%--1)/P3,12)(7)

(i = 1, . . .

. )

n)

(4)

where the prime indicates that Pi and

?!? dPj

are to be evaluated for the a p

or Cauchy functions, in which case

A , = Ps,-2/(1

+ [2(X - P3%-,)/Pd) (8)

which is then minimized. I n Equation 9, d is a damping factor which is chosen empirically and it may be shown (4, 6) that with the optimum choice of the coefficients C,,. . . . . , C,,this modification merely entails multiplying the coefficient of APk in the kth equat,ion in Equation 5 by (l+d2). Another circumstance in which the iteration process may diverge is when small bands are overlapped by larger bands. I n this case, it is advisable to carry out an initial optimization omitting the parameters appropriate to the weak bands and use the parameters so obtained as the trial parameters for a further optimization in which all the parameters are included. d third method, which is perhaps the most elegant, is by the introduction of special conditions as described in the nest section. The iteration process is prevented from diverging by fising the X value of the peak and the half width of each of the weak bands. These restrictions are progressively removed as the final solution is approached.

*03

.02

E" n I

(Y

uc

d 6

t

.oi

INTRODUCTION OF SPECIAL CONDITIONS

5

IO VOLUME OF EFFLUENT (arbitrary units)

I5

Figure 1 . Portion of output ( 0 ) from Spinco 120B automatic amino acid analyzer showing peaks due to aspartic acid ( A ) , methionine sulfone (B), threonine (C), and serine (D)

I n both cases three parameters are sufficient to define each band; the peak the value of X a t the height, P3L--2; peak, P3iPl and the band width a t half height, Pa,. The function B is simply P ~ . vif ~ the ~ base line is flat or P3s+1 X P ~ . v +for~ a sloping base line. The advantage of incorporating B in Equation 6 is that the experimentally determined values may be measured above any convenient and arbitrary base line and this will automatically be cor+ ~ Pa,v+z rected for> because P ; $ ~ v(and in the case of a sloping base line) are introduced as independent parameters. The only practical restric,tion on the choice of a function A to represent a band is that analyt,ical expressions must be obtained for its partial derivat,ive wit'h respect, to each of the parameters although this restriction could be overcome by the use of numerical methods.

+

METHOD OF SOLUTION

Using the values of the trial parameters PI',

. . . . ~,P,'

the coefficients

in Equation 5 can be coinputeL for each of the m equations and the values of the AP,'s may then be obtained by matrix inversion. Routines for the solution of simultaneous linear equations by this method are available at almost all computer centers. As F is not a linear function of the parameters, the corrections calculated from Equation 5 will not completely optimize the parameters and the process must be repeated using the improved parameters as a new set of trial parameters. This process of iteration is continued until the parameter adjustments calculated from Equation 5 are negligibly small. CONVERGENCE OF SOLUTION

If the initial choice of parameters is close to optimum only two or three cycles of iteration are needed; but if the initial choice is poor, the process may not converge and the method fails. I n many cases this difficulty may be overcome by the use of the damped least squares procedure described by Levenberg (4). I n this method the parameter adjustments are introduced as extra terms in Equation 2 giving

It frequently happens that the parameters must be refined subject to certain special conditions, for example the X values of the band masinia may be known accurately and so should not be treated as independent parameters. The simplest method of introducing such restriceions into the optimization is to express them as a series of equations of the form Gk(P1,, ,

, ,

, P,,,)

=

0 (k

1, , . . ,, r )

=

(10) for example if t,he peaks of a series of N Gaussian bands were knoivn t'o occur at PI1, . . . . , Pas-?, then Equa,tion 10 would have the form GL =

P'3k-z

- Pzk-2

0 ( k = 1, . . . ., N)

=

The restrictions are conveniently introduced by minimizing n

S

TVi(F,

=

- Yi)'

+

i=I

r XkGk

k=l

(11)

where the Xis are Lagange's undetermined multipliers ( 5 ) . Setting SS = 0 in Equation 11 yields a set of m r simultaneous linear equations

+

of the AP,'s and the constant term n (.%)I

(1

=

1, . .

,

, mj

(12)

i=l

VOL. 38, NO. 12, NOVEMBER 1966

1771

DISTANCE ON FILM(rnm)

Figure 2. Equatorial distribution of intensity in the x-ray diffraction pattern of silk fibroin analyzed into a series of Cauchy functions and a linear base line

( k = 1, . . ., T )

(13)

where G ’ k = G(P’,, . . . . . , P’m). The solution of these equations gives the required parameters adjustments subject to the special conditions. METHOD

1600 1650 WAVE NUMBER (crn?)

1700

1750

Figure 3. Infrared spectra obtained from a section of silk suture with the electric vector vibrating ( a ) parallel and (b) perpendicular to the suture axis

OF COMPUTATION

The programming of the least squares procedure for the resolution of overlapping absorption bands does not present any special difficulties and a program in FORTRAK IV written for the CDC 3600 computer is used routinely for this purpose. This program is available from the authors. Data input is by punched paper tape and the operation of the program-e.g., number of cycles of refinement, damping factor, specification of special conditions, etc.is controlled through the same medium. I n the output, the observations and trial parameters are listed together with a standard deviation

After each cycle the parameter adjustments are listed and a new value of u calculated using the refined parameters. I n this way the convergence of the solution can be followed and with zero damping factor two or three cycles are usually sufficient. I n addition to listing the final values of the parameters and F , ( X , P,, . . , , P,) provision is made in the program to produce a graphical record of the computed components on a Calcomp plotter.

1772

1550

ANALYTICAL CHEMISTRY

The number of parameters which may be optimized depends on the number of observations and the storage available in the commter. In the uresent Drogram provision is made for 45 parameters and 300 observations. The time taken to complete an analysis is a function both of the numbe; of observations and the number of parameters, but as a guide the analysis in Figure 1 was completed in three cycles a t an average speed of 20 seconds per cycle. ERRORS

When the refinement process has been completed, the deviations between the observed values ( Y J and fitted values ( F $ ’ ) will be distributed normally and the standard deviation as calculated in Equation 14 will contain two components. The first will be a systematic one due to any differences between the assumed and actual curve shapes, and the second xi11 be due to random errors of measurement in the initial observations. Thus the standard deviation may be used as a guide to the selection of an appropriate curve shape in the early stages of a new application of the method. I n the examples given in Figures 1, 2, and 3 the major part of

the standard deviation is due to random errors in the original observations. APPLICATION OF THE METHOD

Three examples of the application of the method to widely different analytical problems are illustrated in Figures 1, 2, and 3. I n the first application, illustrated in Figure l , portion of the output from a Spinco Model 120B automatic amino acid analyzer containing the peaks due to aspartic acid, methionine sulfone, threonine, and serine is resolved into individual bands, four Gaussian components, and a linear base line. The sum of these is shown as a full curve. The standard deviation between the observed points and the calculated curve is 0.0003. I n the second application shown in Figure 2 a trace of the equatorial distribution of intensity in the x-ray diffraction pattern of silk fibroin is analyzed into a series of Cauchy functions and a linear base line. The observed curve was fitted a t 91 equispaced abscissa values with a standard deviation of 0.005. The final example shows a more complex anaIysis of polarized infrared spectra obtained from a section of silk suture with the electric vector vibrating

parallel (Figure 3a) and perpendicular (Figure 36) to the suture axis. The kIand function assumed to be a mixture of Cauchy and Gauss functions and special conditions were introduced which required that corresponding components in the two spectra had the same position and half band width. The spectra have been analyzed into five bands which are linear combination: of 40% Gauss/60% (lauchy functions. The curves lvere fitted at g5 equisIJaced with a standard deviation of 0.011.

LITERATURE CITED

c. spectry.

( I ) Giese, A. T., French, S.,A p p l . Spectry. 9, 78 (1955). (2) Hart, R. R., J . Luol. 17, 368 (1965). ( 3 ) Jones, R. XI., Sechadri, K..S., Hopkins,

(4) J. Levenberg, w . Can. ~ J . Chem. K., Quart. 409 334A p(1962). p l . Math. 2, 164 (1944). ( 5 ) Marsh, R. E., Corey, R. B., Pauling, L., Biochim. Biophys. Acta 16, 1 (1955). (6) hfeison, J.7 J . Opt. SOC. Am. 5 5 , 1105 (1965). (7) Savitzky, 4.,Golay, h4. J. E., ASAL. CHEM.36, 1627 (1964).

(8) Stone, H., J . Opt. SOC.Am. 5 2 , 998 (1962). (9) Tubom’Jra~ T . ~J . Chem. Ph?/s* 24, 927 (1956). (10) Vandenbelt, J. M., Heinrich, C., i l p p l . Spectry. 7, 171 (1953). (11) Longo, Yonda, Tu’. A., A.,Filmer, Hirs, C. D. H. H.,W., Pate, Anal. H.,

Biochem. 10,53 (1963). R. D. B. FRASER

EIKICHISUZUKI

CSIRO Division of Protein Chemistry Parkville N. 2 Victoria, Australia

Simulation of Counter Double Current Distribution by Digital Computer SIR: Counter double current distribution (CDCD) differs from countercurrent distribution (CCD) and steadystate distribution (SSI)) in that both solvents move simultaneously in C D C D (S), whereas in CCD only one phase niovec (6) and in SSD the phases move independently (1). CCD has an adequate mathematical description ( 6 ) , but C D C D and SSD are not so easily described, and no general equations have been derived relating concentrations in the machine to partition coefficient, feed weight, feed position, transfer number, solvent ratio, and number of tubes in the fractionating train. Now a digital computer has been programmed to simulate the CDCD process. This discussion, however, is limited to the behavior observed when cperating in the preparative mode or “procedure 3” as deGignated by Post and Craig (3). EXPERIMENTAL

Assuming that a given mixture of qolute. is presented for separation and that an inimiscible solvent pair is available for which there are measured

partition coefficients, two parameters of the C D C D process are the solvent ratio ( R ) , which multiplied by the partition coefficient ( K ) gives the effective extraction coefficient ( E ) or RK = E , and the position of the feed tube. The number of tubes in an apparatus is also a parameter, and as used in this paper mill be 25, the number of tubes in our instrument. Rate of feed per transfer may, within limits, also be a parameter; however, the rate is assumed to be below an amount that would cause variations in partition coefficient and consequently may be ignored. I n the computer program written to simulate CDCD, the solute content of any given tube is calculated from the previous solute content in the two tubes directly to the right and left of it based on amounts specified by the extraction coefficient. Solutes leaving a particular tube do so as specified by the extraction coefficient, one fraction going to the right and one to the left. I n contrast to a more analytic approach to programming, this mathematical approach involves no cumulative constants, has no exponential terms, and will run indefinitely or as required. Input data consist of the partition co-

efficient, the solvent ratio, the amount of feed, the feed tube number, the number of tubes in the fractionating train, and the frequency of printout. The program may be stopped when either a specified number of transfers or any specified degree of steady state designated by a printed message is attained. This program written in Fortran I1 is available on cards from the authors upon request. RESULTS

The course of attainment of steady state for extraction coefficient E = 1.0, center feed, and 25 tubes is given in Figure 1. I n this and the following two figures, the ratio of sample weight per tube to the weight of feed per transfer ( T / F ) is the ordinate. An extraction coefficient of 1.0 with center feed constitutes the set of conditions resulting in the slowest possible attainment of steady state for 25 tubes, 665 transfers; all other variations with regard to position of feed and extraction coefficient reduce the transfers required. The parabolic-appearing curves of the initial transfers approach the straight

I

“I

Tube Number Figure 1. Rate of attainment of steady state for extraction coefficient 1.0 and center feed (tube 13) of a 25-tube extraction train for the given number of transfers ( T F ) The ratio of weight per tube to the weight of feed pertransfer is designated as

r/F

5

Tube Number Figure 2. Steady state calculations for extraction coefficient 1 .O and feed tubes 1, 3, 5,7,9, 1 1 , and 13 VOL. 38,

NO.

12, NOVEMBER 1966

1773