Determination of microscopic acid dissociation constants by nuclear

Jul 1, 1976 - Rebecca C. Hoye , Gretchen L. Anderson , Susan G. Brown , and Erica E. ... Determination of Residue-Specific Acid Dissociation Constants...
1 downloads 0 Views 741KB Size
Determination of Microscopic Acid Dissociation Constants by Nuclear Magnetic Resonance Spectrometry Dallas L. Rabenstein* and Thomas L. Sayer Department of Chemistry, University of Alberta, Edmonton, Alberta, Canada

The determlnation by NMR of the microscopic acid dissociation constants of polyprotlc acids having two groups of similar acidity has been studied. The change in chemical shift of a unique resonance with changes in pH gives directly the fractional deprotonationof one of the two groups as a function of pH. A nonlinear least squares curve-fltting method for evaluating the microscopic acld dissociation constants from the fractlonai deprotonation data makes use of macroscopic acld dissociation constants which are also evaluated from the NMR data. The macroscopic and microscopic acid dissociation constants for the two ammonium groups of ethyienediaminemonoacetic acld and of lysine have been determined. The microscopic constants calculated by the curve-fitting method are compared to those derived from the same NMR data by literature calculation procedures.

The acid-base chemistry of polyprotic acids usually is characterized in terms of macroscopic acid dissociation constants, which are composites of the microscopic constants for the individual acidic groups. If the microconstants of the individual groups are quite different, deprotonation will proceed almost exclusively by a single pathway and the microconstants for that pathway are essentially equal to the macroconstants. However, if the microconstants for the individual groups are within several orders of magnitude of each other, the macroconstants cannot be assigned to specific groups. Of the various methods which have been used to determine microscopic acid dissociation constants for two groups of similar acidity (1-111, those in which the constants are calculated from the fractional deprotonation of one of the two groups as a function of p H are the most direct (3-5,8-11). Of the techniques used to measure the fractional deprotonation, NMR potentially has the widest applicability since all organic acids have NMR-active nuclei whose chemical shifts change as a result of deprotonation of nearby acidic groups. Depending on the location of an NMR-active nucleus relative to the two acidic groups, its chemical shift may depend on the degree of protonation of only one of the groups (a unique resonance) or of both groups (a common resonance). Changes in the chemical shift of a unique resonance are linearly related to the degree of protonation of the acidic group whose protonation state affects its chemical shift. This paper presents a method for calculating microscopic acid dissociation constants which is particularly suited to their evaluation from changes in the chemical shifts of unique resonances. The method is based on nonlinear least squares curve-fitting of the fractional deprotonation of one of the two groups as a function of pH and makes use of macroscopic constants which are determined from NMR data from the same experiment. Application of this method is illustrated with the determination of the microconstants for the two ammonium groups of ethylenediaminemonoacetic acid and of lysine. The microconstants so obtained are compared to those derived from the same NMR data by previously described procedures ( 4 , 1 0 ,l l ) .

EXPERIMENTAL Chemicals. L-Lysinemonohydrochloride (Eastman Organic Chepicals) was used as received. Ethylenediaminemonoacetic acid dihydrochloride was prepared by a literahre method 412). A stock solution of tetramethylammonium (TMA) nitrate was prepared by titration of a 25% aqueous solution of TMA hydroxide (Eastman to a neutral pH. Organic Chemicals) with "03 pH Measurements. pH measurements were made with Orion Models 701 and 801 pH meters equipped with a standard glass electrode and a porous ceramic junction, saturated calomel reference electrode. The pH meter was calibrated with standard solutions of pH 4.00, 7.00, and 10.00 at 25 "C. Acid dissociation constants were evaluated both as mixed activity concentration constants (activity of hydrogen ion concentration of acid and its conjugate base) and as concentration constants. Hydrogen ion concentrations were obtained from the pR meter readings using activity coefficients calculated with the Davies equation (13,14).The acid dissociation constants reported in the tables are mixed constantq factors for conversion to concentration constants are given in the footnotes to the tables. NMR Measurements. Proton NMR spectra were obtained on Varian A60D or HA-100 high resolution spectrometers at 25 & 1 "C. Chemical shifts were measured relative to the central resonance of the TMA triplet, and are reported relative to the methyl proton resonance of sodium 2,2-dimethyl-2-silapentane-5-sulfonate (DSS). Positive shifts indicate protons less shielded than the methyl protons of DSS. Sample Preparation. Solutions were prepared in distilled water and contained 0.005 M TMA for an internal chemical shift reference. KC1 was added to the 0.02 M ethylenediaminemonoacetic acid (EDMA)solution to maintain a constant ionic strength of 0.16 M; the concentration of lysine (0.19 M) was so high it was not practical to control the ionic strength with inert electrolyte. The pH was adjusted with concentrated "03 and KOH to minimize dilution. Curve-fitting Calculations. The curve-fitting calculations to obtain the "best" microscopic and macroscopic constants as judged by the least-squares criterion were performed with the program KINET, developed by J. L. Dye and V. A. Nicely ( 1 5 ) . The uncertainties reported with the constants are linear estimates of the standard deviation, as calculated by KINET.

RESULTS Evaluation of Microconstants. The microscopic acid dissociation of the diprotic acid HiAHj can be described by the scheme

where the subscripts i and j indicate the protons on acidic groups i and j , and hi, kj, hi,, and kji are the microscopic acid dissociation constants. The deprotonation reaction to which a given microconstant refers is indicated by its subscript, with the last letter in the subscript denoting the group involved in the deprotonation step t o which the constant applies and a preceding letter, if present, denoting the group which was deprotonated in a preceding step. The microconstants are defined as

ANALYTICAL CHEMISTRY, VOL. 48, NO. 8, JULY 1976

1141

(4) In terms of the individual species, the macroconstants are defined as

a H+ [IVI

Kz =

[11]

+ [111]

phi = pMi a t fi,d extrapolated t o 0 and pkji = pMi a t fi,d extrapolated to 1( 4 ) .These values for pki and pk;; are then used with the value of the pH(= pM) a t fi,d = 0.5 to obtain k; from k; = a ~ + ( k-i a H + ) / ( a H + - kji). ki; then is calculated from ki; = kjk;;/ki. Alternatively, pkij can be estimated from pK2 and Pkji. Evaluation of Macroconstants. The macroconstants needed for the evaluation of the microconstants generally can be calculated from chemical shift data obtained in the same experiment. The advantage of using macroconstants so determined is that they apply to the experimental conditions. If the polyprotic acid has a resonance whose chemical shift depends on the state of protonation of the two acidic groups which deprotonate simultaneously, that is a common resonance for the two acidic groups, its chemical shift is given by

which leads to

+ k; Kz = ki;k;i/(hi; + k;i) K1 = k;

K1Kz = k;k;j = k;k;i

(7) (8) (9)

The first step in the determination of microconstants by the method which is the subject of this paper is the calculation of the fractional deprotonation of one of the two acidic groups as a function of pH from the chemical shift of a unique resonance. The chemical shift of a unique resonance for acidic group i is the weighted average of the chemical shifts of those forms in which acidic group i is protonated and deprotonated, as given by Equation 10. 6obsd

= fi,p6i,p

+ fi,ddi,d

(10)

where fi,p and fi,d represent the fractional protonation and deprotonation of group i and 6i,p and 6i,d are the chemical shifts of the unique resonance of those forms in which group i is protonated and deprotonated. For the above scheme, = 61 = 6111 and di,d = 611 = 61v. Substitution of fi,p = 1- fi,d into Equation 10 leads to fi,d

=

- ai,p 6i.d - 6i.p

dobsd

[I11 + [IVI [I] [11] [111] + [IV] Substitution of Equations 1-3 into Equation 12 gives fi,d

=

=

+

+

+

kiaH+ hihi; aH+2 (hi kj)aH+

+ +

+ k;k;;

(13)

or

when Equations 7 and 9 are introduced into the denominator. Microconstants ki and ki; are evaluated by curve-fitting the fractional deprotonation data to Equation 14, using values for macroconstants K1 and Kz which have been determined from NMR data from the same experiment, as described below. Microconstants k, and k,i are then calculated from ki, hij, and K1. The initial values of the microconstants required by the curve-fitting program KINET (15)can be obtained from p'Mi vs. f+ plots ( 4 ) , where pMi = pH 1142

- log-

(17) p = 2 - fi,d - fj,d In terms of the species present, p is defined by Equation 18,

which leads to

(11)

Microconstants are then calculated from the fractional deprotonation as a function of pH data by curve-fitting. For the above scheme, the fractional deprotonation of group i is defined as fi,d

where &HA, and BA are the chemical shifts of the common resonance for the protonation states indicated by the sub~ A6~ are obtained from the NMR spectra of scripts. ~ H and sufficiently acidic and basic solutions. Macroscopic constants K1 and Kz and HA can be calculated by nonlinear least squares curve-fitting of the common resonance chemical shift data to Equation 16. If the molecule has a unique resonance for each of the two acidic groups, the fractional deprotonation data obtained from the unique resonances can be combined to give the average number of protons per acid molecule, p .

fi,d

1 - fi,d

ANALYTICAL CHEMISTRY, VOL. 48, NO. 8, JULY 1976

when [HzA] and [HA] are expressed in terms of the macroconstants. K1 and Kz can be obtained from p vs. pH data by nonlinear least squares curve-fitting to Equation 19. The pH values a t p equal to 1.5 and 0.5 are good initial estimates of pK1 and pK2 for use in the nonlinear least squares curve-fitting. Ethylenediaminemonoacetic Acid. The chemical shifts of the methylene and ethylene protons of EDMA are shown as a function of pH in Figure 1.The methylene protons are a single line over the pH range shown while the resonance pattern for the ethylene protons varies in complexity from a single line to an AA'BB' multiplet; the chemical shift of the center of the multiplet is plotted in Figure 1. Since the effect of deprotonation on the chemical shift of carbon-bonded protons is attenuated rapidly as the number of bonds between the site of deprotonation and the carbon-bonded protons increases, the change in chemical shift of the methylene proton resonance as the pH is increased from 4 to 12 is due to deprotonation of the secondary ammonium group, while the change in chemical shift of the center of the ethylene multiplet reflects deprotonation of both ammonium groups. The data in Figure 1 indicate that the two ammonium groups deprotonate simultaneously, as shown in Figure 2. The carboxylic acid group, the secondary ammonium group, and the primary ammonium group are labeled 1,2, and 3, respectively. pK1 for the carboxylic acid group and the chemical shifts of the methylene protons of fully protonated EDMA and diprotonated EDMA were determined to be 1.86 f 0.01,4.070 ppm and 3.703 ppm, respectively, by curve-fitting chemical

2.6,

I

,

,

,

,

,

,

,

CH~CH~NHCH~COI

t

kYp

\QZ3

H~NCH~CH~NHCH~CO;

H3N CHzCH,NHzCH,CO;

2

3.0 v)

5

-

Figure 2. Microscopic acid dissociation scheme for the ammonium groups of ethylenediaminemonoaceticacid

32I--

LL I

?lk

H2N CH2CH2NHzCHzCO;

-

v)

$

3A-

u 3 I

-

3.6

-

'I 4

IO

8

6

12

PH

Figure 1. pH dependence of the chemical shifts of the carbon-bonded protons of ethylenediaminemonoacetic acid in a 0.02 M aqueous solution

0

02

OA

IO

08

06

f2.d

Table I. Microscopic Acid Dissociation Constants of Ethylenediaminemonoacetic Acida, MethodC A

Figure 3. p& vs. f2,d, the fractional deprotonation of the secondary, ammonium group, for EDMA

B

6.94 f 0.01 7.39 f 0.01 9.39 f 0.02 9.85 f 0.01 SSd 5.60 x 10-3 3.27 x 10-4 0.02 M EDMA, 0.10 M KC1, p = 0.16 M, 25 O C . Mixed activity-concentration constants. To convert to concentration constants, subtract 0.09 from each p k . Method A; pM2 vs. f 2 , d . Method B; curve-fitting Equation 14 using KINET and pKz = 6.80 and 9.97 (from common ethylene resonance). d Sum of squares of residuals = 2 ( f 2 , d c a l c d - f2,dobsd)'. p k 12 p k 13 p k 132 p k 123

6.91 7.32 9.55 9.96

2.4 N-C"n2-

shift data for the methylene protons a t p H less than 4 to Equation 20. A4

LA

(20) Macroconstants pK2 and pK3 were calculated to be 6.80 & 0.01 and 9.97 f 0.01 from the ethylene resonance chemical shift data in Figure 1 by the common resonance method described above. The microconstants for EDMA were calculated by curvefitting the fractional deprotonation data for the secondary ammonium group t o Equation 14, using for pK2 and pK3 the values calculated from the ethylene resonance. The results are listed in Table I (Method B). The fractional deprotonation of the secondary ammonium group as a function of pH was calculated from the chemical shift data in Figure 1 with Equation 11.The initial values of the microconstants used by the curve-fitting program KINET, also listed in Table 1 (Method A), were obtained from the pM plot in Figure 3. Lysine. The chemical shifts of the protons on the a and carbons of lysine are shown as a function of pH in Figure 4. Since the carboxylic acid group is completely deprotonated a t pH 5 , the pH dependence of the chemical shifts of the protons on the a and E carbons is due to deprotonation of the ammonium groups. The proton on the 01 carbon is two bonds removed from the 01 ammonium group and six bonds removed from the t ammonium group. Thus, the change in chemical shift of the proton on the a carbon as the pH is increased is due

I

/-

.^

IO

12

PH

Figure 4. pH dependence of the chemical shifts of the protons on the a and t carbons of lysine in a 0.19 M aqueous solution

H 3 N (CH,),FHCO;

H 2 N [CH,)dCHCOz-

Figure 5. Microscopic acid dissociation scheme for the ammonium groups of lysine to deprotonation of the a ammonium group. Similarly, the resonance for the two protons on the E carbon is a unique resonance for the e ammonium group, from which its fractional deprotonation can be obtained as a function of pH. The chemical shift data in Figure 4 indicates that, to a small extent, deprotonation of the two ammonium groups overlaps, as depicted in Figure 5. The carboxylic acid group, the a ammonium group and the E ammonium group are !abeled 1 , 2 , and 3, respectively. Macroconstants pK2 and pK3 were determined to be 9.27 f 0.01 and 10.81 f 0.01 by combining the fractional deproANALYTICAL CHEMISTRY,

VOL. 48, NO. 8, JULY 1976

1143

1,

10.6

10.2

5

i

3

4

Y

A '

94

t 02

p vs. pH for

08

10

'x,d

PH

Figure 6.

06

04

Figure 7. pM2 vs. f2,d, the fractional deprotonation of the a ammonium group, and pM3 vs. f3,d, the fractional deprotonation of the e ammonium group, for lysine

lysine

tonation data from the two unique resonances to give the average number of acidic protons per molecule OJ) as a function of p H (Figure 6), and curve-fitting to Equation 19. These macroconstants were then used in the independent curvefitting of the fractional deprotonation data for each of the ammonium groups to Equation 14 to obtain the microconstants listed in Table I1 (Method B). The initial values obtained from the p M plots (Figure 7 ) also are listed in Table I1 (Method A).

DISCUSSION Determination of Microscopic Acid Dissociation Constants. Several methods have been described in the literature for the calculation of microscopic constants from fractional deprotonation data ( 3 , 4 , 8 - l l ) ,including two based on nonlinear least squares curve-fitting (8, 11 ). Niebergall, Schnaare, and Sugita described the calculation of k,, K1, and KlK2, from which the other microconstants are then calculated, by nonlinear least squares curve-fitting of uv fractional deprotonation data to an equation derived from Equation 14 by introducing KlKz into the numerator ( l l ) while , Shrager et al. used a model equation for NMR data in terms of the observed chemical shift of the unique resonance (model 2 in Ref. 8 ) . The calculation procedure developed in this paper differs from those of Niebergall et al. ( 1 1 ) and Shrager et al. ( 8 ) in that it treats the macroscopic constants as known constants in the curve-fitting. As illustrated by the EDMA and lysine examples, this is no limitation when the fractional deprotonation data are obtained by NMR; the macroscopic constants generally can be obtained from data from the same experiment and thus for the same experimental conditions.

The microconstants for EDMA and lysine have not been reported previously, and thus the results in Tables I and I1 cannot be judged by comparison. However, it can be noted from Table I1 that the same results are obtained from the separate analysis of the two sets of unique resonance data for lysine by curve-fitting to Equation 14 (Method B). In the analysis of the a ammonium data, pklz and pk123 are obtained by curve-fitting and then pk13 and pk132 are calculated from pk12, ph123, and pKg, while in the analysis of the t ammonium data, pk13 and pk132 are obtained by curve-fitting. Curvefitting of the fractional deprotonation data for EDMA to the equation derived by Niebergall et al. (11)yields values for kl2, K2, and K2K3 which are identical to those obtained from the common ethylene resonance and the fractional deprotonation data. However, analysis of the fractional deprotonation data for lysine by the method of Niebergall was unsuccessful, with meaningless negative values being obtained for some constants. The results in Table I1 indicate that the calculation procedure described in this paper, when used with the curve-fitting program KINET, can give results for systems having k,lkj ratios up to a t least 8. The microconstants obtained from the NMR data by the p M method of Edsall, Martin, and Hollingworth ( 4 ) , the method most frequently used to determine microconstants from uv data, are listed in Tables I and I1 (Method A). For EDMA, pkl2 can be obtained with some certainty by this method since pM2 is changing gradually at f2,d < 0.5 (Figure 3) whereas f2,d changes more rapidly at larger values and the pk 132 obtained by extrapolation is somewhat more uncertain, as are the values calculated for pkl3 and pk123 from these microconstants. Similarly, for lysine pkl2 can be obtained with

Table 11. Microscopic Acid Dissociation Constants of Lysineagb

Method'

pk 12 pk 13 pk 132 pk 123

A d

Ae

B d

B e

9.34 10.38 9.95 10.99

9.56 10.15 10.22 10.81

9.32 & 0.01 10.21 0.01 9.87 0.02 10.76 0.01

9.32 & 0.01 10.19 0.01 9.89 0.01 10.76 f 0.02

* **

*

0.19 M lysine, j i = 0.19-0.40 M, 25 " C . Mixed activity-concentration constants. To convert to approximate concentration constants, subtract 0.10 from each ph. Method A;pM, vs. fx,d. Method B; curve-fitting Equation 14 using KINET and pK2 = 9.27 and pK3 = 10.81 (from combined fractional deprotonation data). d Using fractional deprotonation data for the a ammonium group. e Using fractional deprotonation data for the t ammonium group. ~~~~~

1144

~

ANALYTICAL CHEMISTRY, VOL. 48,

NO. 8,

JULY 1976

confidence from the plot of pM2 vs. f2,d (Figure 7) and pk123 from the plot of pM3 vs. f3,d, whereas the values of phi32 from the pM2 vs. f2,d plot and pk3 from the pM3 vs. f3,d plot are less certain because of the extrapolations involved. As judged by the criterion of least sum of the squares of the residuals, the microconstants obtained by curve-fitting provide the best fit to the data. Also, the macroconstants calculated from the microconstants obtained from the pM plots do not agree with those obtained from the common ethylene resonance of EDMA or the combined unique resonance data of lysine. The accuracy of microconstants obtained by the pM method can be limited by the necessary extrapolations of the p M plots. Fung and Cheng (10) described a modification of this method for systems whose microconstants are such that a large uncertainty would result in either the ph, or ph,, because of the extrapolation. Specifically, when kjlk, is very large or very small, phi or pk,,, respectively, can be obtained with certainty from the pM plot. The pki or pk,, is then treated as a constant in the determination of the other microconstants. The microconstants evaluated for EDMA and lysine by the modified p M method only provided a slightly better fit to the data than those obtained by the pM method, as indicated by the sum of the squares of the residuals. Niebergall et al. ( 1 1 ) have described a method for the evaluation of microconstants which is based on a linear form of Equation 14 and uses a value for K1 from other experiments. The microconstants obtained by this method for EDMA and lysine fit the data better than those from the pM and modified pM methods but not as well as those obtained by curve-fitting. Determination of Macroscopic Acid Dissociation Constants. The macroscopic acid dissociation constants of diprotic acids having a common resonance or two unique resonances can be evaluated by curve-fitting to Equations 16 and 19, respectively. For comparison with the pK2 = 9.27 and pK3 = 10.81 obtained for lysine by curve-fitting p vs. pH (Figure 6) to Equation 19, values of pK2 = 9.21 and pK3 = 10.81 were estimated for an ionic strength of 0.2 M from the thermodynamic constants reported by Hay and Morris (16). The chemical shift of the commont resonance of the macroscopic monoprotonated species, HA, which is obtained simultaneously with the macroconstants in fitting to Equation 16, is the weighted average of the chemical shifts of the two microscopic monoprotonated species. If the chemical shifts of the two protonation isomers can be estimated, for example using model compounds, the distribution between the two forms, and thus the microconstants, can be calculated. However, the magnitudes of microconstants so calculated are very sensitfive to the estimates used for the chemical shifts of the microscopic monoprotonated forms. The microconstants estimated by curve-fitting common resonance chemical shifts to a model in which the observed chemical shift is expressed in terms of the chemical shifts of the four protonation states are also very sensitive to the values used for the chemical shifts of the two intermediate protonation states (Table IV, Ref. 8). Other workers have determined macroconstants for polyprotic acids by computer fitting common resonance chemical shift data to a model which treats the deprotonation as a sum of simple proton association equilibria ( I 7, 18).Their model equation is of the form

where 6min is the chemical shift of the fully-protonated form and pKi and Ai are the acid dissociation constant and change in chemical shift, respectively, for the i t h protonation transition. The chemical shift titration curves are computer fitted to obtain both the pK and the A for each deprotonation. For

g W

z

008006 -

004:

*P

0

CL

002-

0-

n

-002

2 -0.04

r

y - 0.08 LL

V

-I

-010 pK,-2

pK,-1

pK,

pK,+1

pK,+2

pK,+3

PH

Figure 8. Fractional deprotonation errors inherent in the model used a diprotic acid in Equation 21 (77, 78). The numbers by the curves are the ratio K1 /K2

for

The fractional deprotonation error is defined as the fractional deprotonation predicted by the model used in Equation 21 minus the true fractional deprotonation. The model gives negative fractional deprotonation errors for titration of the first proton (the lower set of curves) and positive errors for titration of the second proton (the upper set of curves)

this equation to be applicable, the fraction represented by [1O(pH-pKJ]/[l 10(pH-pK~)], which in terms of the species present corresponds to [H,-lA]/( [H,-j+lA] [H,-,A]), must be the fraction from which the i t h proton has been removed. This will be the case only if the stepwise deprotonation reactions do not overlap. For example, if pK1 and pK2 of a diprotic acid differ by more than 3 pK units, [ l O ( p H - p K ~ ) ] / [ l 10(pH-pK1)],where i = 1or 2, is indeed the fraction from which the ith proton has been removed to within one part per thousand. As the difference between pK1 and pK2 decreases, the error in the fractional deprotonation given by [lO(pH-pK~)]/[l 10(pH-pK~)] increases. The errors in fractional deprotonation for the first and second deprotonation of a diprotic acid when the deprotonation is treated as the sum of two simple association equilibria are shown in Figure 8 as a function of pH for different values of the ratio K1IK2. These considerations indicate that pK's calculated by curve-fitting to Equation 21 can be in error when the successive pK's differ by less than 3 pK units.

+

+

+

+

ACKNOWLEDGMENT The authors are indebted to J. L. Dye and V. A. Nicely for the nonlinear least squares program KINET and to Mark Greenberg for modifications necessary for its use on the University of Alberta computer. The authors also thank C. A. Evans for preparing the ethylenediaminemonoaceticacid. LITERATURE C I T E D (1)R. L. Ryklan and C. L. A. Schmidt, Arch. Biochem., 5, 89 (1944). (2)D. P.Wrathall, R. M. Izatt, and J. J. Christensen, J. Am. Chem. SOC.,86, 4779 (1964):87, 5809 (1965). (3)R. E. Benesch and R. Benesch, J. Am. Chem. SOC.,77, 5877 (1955). (4)J. T. Edsall, R. B. Martin, and B. R. Hollingworth, Roc. &t/. Acad. Sci. USA, 44, 505 (1958). (5)E. L. Elson and J. T. Edsall, Biochemistry, 1, 1 (1962). (6)A. Loewenstein and J. D. Roberts, J. Am. Chem. Soc., 82, 2705 (1960). (7)N. E. Rigler, S. P. Bag, D. E. Leyden, J. L. Sudmeier, and C. N. Reilley, Anal. Chem., 37, 872 (1965). ( 8 ) R. I. Shrager, J. S. Cohen, S. R. Heller, D. H. Sachs, and A. N. Schechter, Biochemistfy, 11, 541 (1972). (9)D. L. Rabenstein, J. Am. Chem. SOC.,95, 2797 (1973). (IO) H-L. Fung and L. Cheng, J. Chem. Educ., 51, 106 (1974). (1 1) P. J. Niebergall, R. L. Schnaare, and E. T. Sugita, J. Pharm. Sci., 61,232 (1972). (12) Y. Fujii, E. Kyuno, and R. Tsuchiya, Bull. Chem. SOC.Jpn, 43, 786 (1970). (13) C. W. Davies, "Ion Association", Butterworths, Washington, D.C., 1962, p 39.

.

ANALYTICAL CHEMISTRY, VOL. 48, NO. 8, JULY 1976

1145

(14) L. Meites, “Handbook of Analytical Chemistry”, McGraw-Hill, New York. N.Y.. 1963, pp 1-8. (15) J. L. Dye and V. A. Nicely, J. Chem. Educ., 48, 443 (1971). (16) R. W. Hay and P.J. Morris, J. Chem. SOC., Perkin Trans. 2, 1021 (1972). (17) A. R. Quirt, J. R. Lyerla, Jr., I. R. Peat, J. S. Cohen, W. F. Reynolds, and M. H. Freedman, J. Am. Chem. Soc., 95, 570 (1974). (18) M. H. Freedman, J. R. Lyerla, Jr., I. M. Chaiken, and J. S. Cohen, Eur. J. Biochern., 32, 215 (1973).

RECEIVEDfor review January 23, 1976. Accepted March 15, 1976. This research was supported in part by a grant from the National Research Council of Canada and by the University of Alberta. Financial support to T.L.S. by a National Research FelOf Canada and by an I. w. lowship is gratefully acknowledged.

Simplex Pattern Recognition Applied to Carbon- 13 Nuclear Magnetic Resonance Spectrometry Thomas R. Brunner, Charles L. Wilkins,” T. Fai Lam, Leonard J. Soltzberg, and Steven L. Kaberline Department of Chemistry, University of Nebraska-Lincoln, Lincoln, Neb. 68588

Linear discriminant functions for recognition and predlction of three common organic structural features via examination of proton noise-decoupledcarbon-I3 nmr spectra have been developed using a modified simplex algorithm. The functions, designed to be used routinely by an nmr spectroscopist, were derived from training sets containing several hundred spectra. Subsequently, the functions were used lo interpret approximately 2000 spectra in order to predict the presence or absence of each of the three features for the compounds whose spectra were examined. It is shown that the simplex-based functions are superior to linear learnlng machine functions for prediction. These results indicate the potential of the simplex method for generating threshold logic units to be incorporated in an on-line pattern recognitionsystem for nmr.

Previous investigations (1-3) have shown that the linear learning machine method ( 4 ) can be used to develop a spectral interpretation system for proton noise-decoupled carbon-13 high resolution nmr spectra. Furthermore, multiple discriminant function analysis, using a committee consensus and various preprocessing algorithms, increased the reliability of the method in predicting the presence or absence of various functional groups ( 5 ) .The linear learning machine approach is a computationally economical and convenient technique, and the resulting functions are well-suited for incorporation in an on-line interpretation system. Thus, spectroscopists can use the method without significant increases in either experiment time or cost, Complete discussions of the merits of the method, its operational principles, and comparisons with other pattern recognition methods are contained in several recent review articles (6-9). However, the linear learning machine technique also has certain disadvantages. One major disadvantage is the inability of the method to yield optimum pattern classifiers in cases when the data are not linearly separable or not sufficiently representative of the classes. As the classification problem becomes more difficult (e.g., for subtle spectral interpretation questions) the condition that data be linearly separable becomes increasingly unrealistic. Reliability of threshold logic units computed from inseparable data in this way is highly dependent upon the conditions immediately prior to terminating computation. Most often, the computation is terminated after the expenditure of a predetermined arbitrary amount of computer time or after the completion of an arbitrary number of error correction feedback iterations. A second disadvantage is the lack of any convenient means of assuring that, for separable data, the linear discriminant is the best 1146

ANALYTICAL CHEMISTRY, VOL. 48, NO. 8, JULY 1976

possible one, which might be defined as that which gives the most accurate results for unknowns (best prediction, as opposed to best recognition). A previous study of mass spectral interpretation ( 1 0 ) has shown that a modified sequential simplex method ( I I , 1 2 ) offers promise of overcoming these problems. In this paper, the application of simplex pattern recognition to nmr data is reported.

EXPERIMENTAL Data Bases. Three data bases were employed. These were: a published collection of 500 carbon-13 nmr spectra (A) ( 1 3 ) ;a collection of 99 13Cspectra determined in our laboratories (B);and a recent collection of 1767 13Cspectra obtained from the literature (C) (14). Chemical shifts were referenced to internal tetramethylsilane and covered a range of approximately 200 ppm. All spectra were proton noise-decoupled. Only collections A and B contained intensity information, which was digitized to integer values between 1and 100. In each spectrum, the most intense peak was assigned an intensity of 100 and the remaining peaks were encoded relative to that peak. We call this representation absolute intensity encoding (AI). An alternate coding wherein each spectrum within a training set has its intensities normalized t o sum to 100 was called normalized absolute intensity coding (NAI). Binary coding (designated P N P , peak-nopeak) was also used by assigning the value “1” to each resolution element possessing a peak and “0” to those not containing peaks. Because Collection C contained only chemical shifts, P N P coding was the only form used for that set. Collection C contained a large number (ca. 15%) of spectra containing fewer peaks than the theoretical number expected. Collection A included 80 spectra measured in the continuous-wave mode and 420 spectra obtained in the Fourier transform mode. Collection B spectra were measured in the Fourier transform mode using a Varian XL-100-15 spectrometer equipped with 16K word Varian 620/i computer and a Sykes cassette tape unit for mass storage. Collection C contained spectra measured in both continuous-wave and Fourier transform modes. Duplicate spectra were removed from the three data sets. Since Collection C contained no intensities, when duplicates were found, Collection C spectra were eliminated. Preprocessing of NAI data via Fourier transformation to produce simulated free induction decay data for simplex analysis was as described previously (3).Training sets were drawn from Collection A and contained 400 compounds (200 with methyl) for the methyl functional group question, 340 compounds (167 with carbonyl) for the carbonyl determination, and 268 compounds (130 with phenyl) for the phenyl functional group determination. Prediction sets were obtained by using all of Collections B and C, together with any spectra from Collection A not used for training. This procedure resulted in a 2098 spectrum set (with 545 phenyl spectra) for the phenyl question, a 2026 spectrum set containing 471 carbonyl (C=O attached to anything) spectra for the carbonyl question, and 1966 spectra containing 1456 spectra from methyl compounds for the methyl question. Computations. Programs for both linear learning machine and modified sequential simplex calculations were written in FORTRAN IV and all computations were performed using an IBM 360/66 computer.