Environ. Sci, Techno/. 1995, 29, 413-425
Regularized Least-Squares Methods for the Calculation of Discrete and Continuous Affinity Distributions for Heterogeneous Sorbents+ MIROSLAV t E R N I K AND MICHAL BORKOVEC* Institute of Terrestrial Ecology, Federal Institute of Technology, Grabenstrasse 3, 8952 Schlieren, Switzerland
J O H N C. WESTALL Department of Chemistry, Oregon State University, Corvallis, Oregon 97331
From experimental isotherms, w e calculate discrete and continuous affinity distribution functions for environmental sorbents. The calculation of such distributions is an ill-posed problem,which, if approached by classical solution schemes, leads to serious numerical problems and instabilities. The application of regularization methods circumvents these difficulties and provides a substantial stabilization of the resulting distributions. W e show that, without additional information aboutthe affinity distribution, it is generally impossible to recover the true affinity distribution from realistic experimental data. As this information is usually not available, such data analysis remains a fitting exercise, which may however have an enormous practical value. The stability of the resulting distributions offers novel possibilities for comparison of sorption properties of environmental materials.
Introduction Sorption of chemicals from the aqueous phase to heterogeneousenvironmentalsorbents such as humic substances, minerals, soils, sediments, aquifer, or rock materials plays a central role in many disciplines of environmentalsciences. For these materials, the amount adsorbed often varies gradually even if the concentration in solution is changed by many orders of magnitude. This typical behavior for heterogenous sorbents has already been interpreted by Langmuir as a superposition of independent sorption sites, characterized by an affinity distribution (1). However, the determination of such affinity distributions (or affinity spectra) from actual experimental data turns out to be far from trivial. Since the early attempts (2, 31, this problem has received much attention within gas adsorptionliterature but also in biochemical and environmental sciences (4-6). Nevertheless, many of the proposed approaches are often unreliable, unstable, and can yield very different results even when applied to the same experimental isotherm. Such difficulties are typical for a class of problems,which are known within the numerical mathematics community as ill-posed (7). The science (and art) to solve such “insoluble” problems has evolved into an active field of research, and much know-how and a large collection of mature algorithms have become available (8-10). Some of these methods have been appliedwith success to a variety of similar problems originating from a wide spectrum of scientific disciplines (7) such as image reconstruction (111, determination of particle size distribution functions from scattering intensities (12,131,and evaluation of rate constant distributions from chemical kinetics data (14). Even though the calculation of affiiity distributions from sorption isotherms represents essentially the same problem, these methods have been adopted in this field rather slowly, and to our knowledge, only in the context of gas adsorption (4, 15, 16). The difficultyin the calculation of the affinity distribution function from an experimental sorption isotherm is that one attempts to invert an inhomogeneousFredholm integral equation of the first kind subject to substantial (experimental) noise. This problem is known to be severely illposed. The practical implications are easily understandable: (i) there are many different affinity distributions, all of which lead to a good fit of the data; (ii) “simple”solution schemes will often fail; and (iii)if one obtains results, minor changes in the experimental data points may lead to enormous changes in the determined distribution (8). As we shall see, the application of modern numerical procedures avoids most of these difficulties, but not without a price. As the problem is ill-posed, many different affinity distributions lead to a good fit of the data; therefore, one must somehow decide which distribution is the optimal or the “best”one. The information necessary to do this is not contained in the experimental isotherm, but must be supplied from elsewhere. For example, if one knows that the measured titration curve originates from a mixture of a few low molecular +
Dedicated to Hans Sticher on the occasion of his 60th birthday.
* To whom correspondence should be addressed.
0013-936~95/0929-0413$09.00/0
D 1995 American Chemical Society
VOL. 29, NO. 2, 1995 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
413
weight oligoprotic acids (this information may originate from chemical analysis of the sample), then the optimal solution must be composed from a few discrete sites. For a heterogeneous humic substance which contains many different types of sites, one may argue that the most realistic affinity distribution should be continuous and smooth. If, on the other hand, one would like to incorporate the protonation behavior of this humic substance into an existing speciation code which can handle discrete sites only, the optimal distribution would no longer be the smooth one but rather discrete and contain as few sites as possible. The characteristics of the optimal distribution depend on the application, the availability of additional information, and, naturally, on subjective preference. In this paper, we demonstrate that regularized leastsquares methods can help to resolve such problems. We show how these methods can be applied for the calculation of affinitydistribution functions from experimental sorption isotherms of chemicals sorbed from the aqueous phase onto common heterogeneous environmental sorbents. Previously published sorption data of various cations and acid-base titration curves are reanalyzed with the present methods.
Numerical Methods The sorption isotherm q(c),which represents the adsorbed amount as a function of concentration in solution c, can be written for M different, independent sites as a linear superposition of Langmuir isotherms (1, 5, 6):
where s, and 4are the concentration and affinity constant of the sitej, respectively. We shall always use mole per liter as basic units for the solution concentration c (implying that %has units of Llmol). The units of the site concentrations sj are the same as the units of the adsorbed amount q, which usually differs from case to case (mol/g, dimensionless, etc.). In the limit M - m, the continuous form of eq 1 is obtained (4-6):
(2) wherefllog 4)= sjlA is the affinity distribution or affinity spectrum and A = log I(+l - log I$ is the grid spacing, which approaches zero in this limit (log denotes the common logarithm with base 10). Note that the derivative of the isotherm eq 1 obeys the inequality 0 Idqldc 5 q l c
(3)
which means that the slope on a doubly logarithmic plot lies between 0 and 1. Isotherms that do not satisfy this inequality cannot be represented by either eq 1 or eq 2. The basic problem is to determine the affinity distribution from an experimental record of the sorption isotherm q(c). The experimental data set generally consists of Npairs of data points: ci and qifor i = 1, ..., 1c’
which contain experimental errorswith (usually unknown) 414 1 ENVIRONMENTAL SCIENCE &TECHNOLOGY / VOL. 29, NO. 2, 1995
standard deviations of uc,i and Uqji respectively. One attempts to evaluate the affinity distribution which is characterized by the concentrations of the sites s, for a given setofaffinityconstantsq cj= 1,...,MI orwithitscontinuous equivalent, the function Jlog K ) . The classical solution scheme for such problems is the least-squares method. Drawbacks of Classical Least-Squares Methods. We focus on the solution of discrete eq 1 since the solution of the continuous version of eq 2 can be approximated by the solution of the discrete problem with a closely spaced grid of affinity constants. Furthermore, let us assume that the affinity constants (i = 1, ..., M) are given and the application of the least-squares method reverts to the problem of finding the corresponding site concentrations s, such that the deviations between the experimental and calculated values are small. Commonly, one considers the objective function
(4)
where the standard deviation ui is given by the error propagation formula:
(5) For a large number of independent random variables, the distribution of x2 approaches a normal distribution with mean Nand variance 2N. The common solution scheme is to locate the minimum of the objective function, namely, one attempts to
Many algorithms exist for performing this minimization of a nonlinear function of several variables (9, 10, 17). As long as the number of sites is small, this simple approach of eq 6 may be all that is necessary for a satisfactory result. But for larger number of sites, several problems arise. The data may be “overfit”to the point that some of the random variation, which is caused by the experimental error, is also fitted by the model. In this case, the resulting values of x2 become very small and statistically very unlikely. In addition to that, substantial sensitivity to initial parameter guesses and difficulties in attaining convergence may be encountered. The reason for these complications is that the optimization problem is very ill-posed. The objective function x2 has very small derivatives in several directions, and there are many affinity distributions that all lead to an equally good fit of the data. Regularized and Constrained Least-Squares Methods. The remedy to these problems lies in amending the classical solution eq 6 through additional constraints and regularizing functions. Their proper choice makes the problem well-posed, stabilizes the solution scheme, and avoids overfitting (7, 8, 18). Nonnegativity Constraints. The most important constraint follows from the simple observation that site concentration s, cannot become negative, which leads to the solution scheme
minimize subject to
x2 (sl,
sum of squares to a range which is statistically very likely. This leads to the solution scheme (9)
..., sM)
si 2 0 f o r j = 1, ..., M
(7)
One must minimize a function of several variables subject to nonnegativity constraints (bounds on the variables) (9). The effect of such nonnegativity constraints is usually sufficiently strong to make resulting affinity distribution unique (seeAppendix). For negligible errors in the solution concentrations (o,,j= 0), quadratic programming methods can be applied to solve eq 7 and to find affinity distributions (19, 201. Even with the nonnegativity constraints, much freedom still remains in choosing the resulting distribution, and eq 7 will generally overfit the data. A proper solution scheme must scrutinize the remaining freedom and select the optimal affinity distribution among all possibilities. Regularizing Functions. In order to bias the affinity distributions toward the desired features of the optimal distribution, one introduces a regularizing function R(sl, ...,SM). This function of all site concentrations Sj is chosen such that its value is small for those distributions which have the desired properties and large for others. For some applications, one may wish to obtain continuous affinity distributions. In most cases, the resulting distribution will be also a smooth function, which can be selected by the regularizing function (8, 12, 18) M- 1
R(s1,..., sM)=
(sj-l - 2sj +
(smoothness)
j=2
(81
which is the sum of squares of finite difference approximations of the second derivatives. This regularizing function is small for affinity distributions, which have small mean curvature, and is large for strongly oscillating functions. For other applications, one might want to describe a given sorption isotherm with as few sites as possible. Therefore, one would like to favor a small number of sites and introduce a penalty for many sites with small concentrations. A regularizing function capable doing this is
subject to
N - j3 f
R(s,,
..., sM)
i 5 x2 (sl, ...,sM) 5 N + /3 f i (11)
and
sj
2 0 f o r j = 1,
..., M
where B is a constant of order unity and is determined by the desired statisticalprobabilitylevel. One has to minimize a function of severalvariables subject to nonlinear inequality and nonnegativity constraints (see Appendix). If experimental errors u,i and uq,iare known, this solution scheme converges to the usually unique solution of the problem. If the experimental errors are not known accurately, they must be treated as adjustable parameters or, more likely, the second method would be used. In the second method, one minimizes a linear combination of the sum of squares and the regularizing function (8, 9)
minimize subject to
x2(s1,..., sM)+ IR(S,, ..., sM) si 2 0 f o r j = 1, ..., M
(12)
where the regularization parameter I =. 0 is chosen as large as possible but consistent with the resulting fit of the data still being compatible with the statistically expected value of x2 (or other statistical criteria, see below). The solution eq 12 requires (as eq 7) the minimization of a nonlinear function of several variables subject to nonnegativity constraints. The proper value of1 is obtained by repeating the optimization for different values of I , Spectral Window. We choose a fixed grid of affinity constants 4 forj = 1, ...,M. Suppose that the measured concentrations ci lie between c m and ,c, and therefore the spectral window for the affinity constants is bounded by
s&sk M
R(s,, ..., sM) =
minimize
log K d = -log c,,
-y
log Kmax= -log c,h
+
i-1
(small number of sites) (9)
(13)
j=lk=l
Finally, much discussion has been devoted to maximum entropy methods, which ought to be able to select the most probable affinity distribution (8, 11). Here one should minimize the negative entropy (13) M
~ ( s , , ..., sM)=
C si In ( s j / s Y )
(maximum entropy)
j= 1
(101
where sy) are fixed, prescribed amplitudes of the so-called “prior image”, which in our case, corresponds to an initial guess of the affinity distribution. Regularized Least-Squares Methods. Given these regularizing functions, we now have to minimize the value of the regularizing function and keep x2 sufficiently small at the same time. Basically, two different methods have been proposed to do this. In the first method, one attempts to minimize the regularizing function and simultaneously to confine the
where y =. 0 is a not too large constant (here we use y = 1). Thus a sensible grid is given by log
5 = log Kd + (i - 1)A for j = 1, ...,M
(14)
and a grid spacing A = (log Km, - log K,.&/(M - 1). Of course, it is possible that sites with affinity constants outside the spectral window exist. However, for data containing experimental errors, these sites cannot be distinguished from sites either with 4 0 or with 4 DO. The first case corresponds just to a simple linear isotherm with a slope while the second case results in an additive constant denoted by so. In order to incorporate all possible sites outside the spectral window, the function for data fitting becomes
-
-
where the affinity constants 4lie within the spectralwindow VOL. 29, NO. 2, 1995 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
415
and are given by eq 14. With the remaining site concentrations sj, the constants &n and SO are now treated as fitting parameters subject to nonnegativity constraints. (As discussed below, one exception is acid-base titration curves, for which SO can become negative.) Dimensionless Quantities. In subsequent sections, we shall report calculated affinity distributions from experimental and model data in dimensionless quantities. For discrete distributions, we s h d refer to the mole factor of a given site
e'J = Sj/Stot
random can be calculated (17, 211, If the probability p drops below a critical level (say 0,051, we must reject the hypothesis that the residuals are unconelated and random. If n+ and n- exceed approximately 10, the distribution of u is well approximated by a normal distribution with mean (4m n ) / 2 nand variance 2m(2m - n ) / [ n * ( n I)] where n = n+ n- and m = n+n-. For small values n+ and nthe critical probabilities are tabulated ( 1 7, 21). Durbin- Watson Test. As an alternative,one can examine ( 1 7, 22)
+
+
(16)
where stat is the total concentration of the sites in the spectral window (excluding sites outside) and is given by M
j= 1
For continuous distributions, we employ the mole fraction distribution function
Both distributions are normalized over the spectral window
M
Lej=1 j= 1
The advantage of these dimensionless distributions is that all information concerning units is contained in three constants stotrso, and &in. Statistical Tests. The solution eq 11 automatically selects the optimal power of the regularization function by incorporating the x2 goodness of fit test. This test can also be used together with eq 12. The x2 test is, however, rather problematic as it requires very accurate estimates of the standard deviations of the experimental errors, which are never available for sorption data. (For example, an error estimate of 1-2% of the value of the experimental data is accurate merely within a factor of 2 and is inadequate for a x 2 test.) Alternative approaches are based on the hypothesis that experimental errors are uncorrelated random variables. In this case, residuals w ill remain uncorrelated as long as the data are properlyfitted or overfittedwhile underfitting leads to trends and correlations in the residuals. Tuning between these extremes can be achieved by varying the weight of the regularizing function, i.e., by changing the (unknown) experimental errors (in eq 11) or by adjusting the regularizationparameterd (in eq 12). We employtwo different tests in order to decide whether the residuals are uncorrelated. Of course, meaningful results are obtained only if the basic assumption of statistical independence of the errors is correct. Test for Runs. One arranges the data in order of increasing values of ci and notes the signs of the differences qj - q(cJ. The total number of plus and minus signs are denoted by n+ and n-, respectively, and u represents the number of runs, which are the sequences without a change in sign. If the sequence of signs is random and uncorrelated, u is large while a small value of u signals correlations. If u runs are observed, the probability p that the residuals are 416
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 29, NO. 2. 1995
where ri = [q,- q(cJ]/oiis the residual. In order to test the null hypothesis of no correlation against the alternative of positive correlation, d is compared with tabulated critical values dLand d ~ If. d > du, the hypothesis of not correlated residuals is accepted; if d < d ~this , hypothesis is rejected. If d~ 6 d 5 du, the test is said to be inconclusive. At a significant level of 0.05, the critical d values lie around 1.5 (see Table 3.2 in ref 17).
Analysis of Generated Data In this section, we study data generated from the isotherm eq 1 with known affinity distributions. Uncorrelated random errors distributed according to a Gaussian distribution with constant relative standard deviation (RSD) are added to the data points. The Gaussian distribution is truncated to three standard deviations on both sides in order to exclude infrequent outliers. Data Generated from a Discrete Affinity Distribution. As an example of a discrete distribution, we investigate adsorption isotherm data with a four-site spectrum discussed previously (23, 24). The spectrum was originally devised to model a titration curve of a mixture of four monoprotic acids but can also be thought to mimic any kind of heterogeneous sorbent. As shown in Figure l a (right), the spectrum spans a rather narrow range and is given by log K1 = 5.55, el = 0.32; log KZ= 4.92, e2 = 0.22; log K3 = 4.14, e3 = 0.16; and log IC4 = 3.20, e4 = 0.30. The adsorbed amount is reported as the mole fraction of the occupied sites relative to the total number of sites (stat = 1). We have generated 30 data points ranging from lo-' up to mol/L (see of Figure la, left) which cover the entire spectrum. A spectral window from log Kmin= 2 to log Kmax= 8 and a grid spacing A = 0.01 or 0.1 is used. Increasing Regularization Power. Data were generated with constant RSD of 0.02 on both the concentration in solution cand adsorbed amount q. The effect of progressive regularization power is shown in Table 1 and in Figure lb-d where resulting residuals (left) and affinity spectra (right) are shown. Figure l b shows the result of a leastsquares fit with nonnegativity constraints only (no additional regularizingfunction, eq 7 ) . One obtains a sevensite spectrum and avalue ofx2/N= 0.74 (see also top data row of Table 1). With a probability of 0.95, the values of x2 should lie within the interval 0.55 < x2/N< 1.56,which is the case. But we have probably overfitted the data, since the x2 value could be larger and closer to the statistically more likely value of unity. Furthermore, one suspects that all peak doublets in Figure l a may not be really essential to obtain a good fit.
0
0
-1
Y
p-l
b4 0
d
d
-2
-
-2.-
I
0-
2
II1 l Il
v)
Q k
I
' 01 -
-8
- 10-1
-8
I
-6
-2
-4
1
I
t
-6
-4
-2
-2-
-8
-2+
-8
-2
-4
1 I1
1,
-6
-4
-2
-6
-4
-2
Q
an 0
H
2
log c (mol/L) FIGURE 1. Isotherm, residuals, and affinity spectra for the discrete four-site model as a function of the common logarithm (basis 10) of the concentration (mol/L) or affinity constants (Umol). (e) Generated experimental data points with relative standard deviation (RSD) of 0.02 (left) with correspondingaffinity spectrum (right). Below, the residuals [qi - q(ci)biare shown (left) and corresponding affinity distributions (right) calculated from generated data. (b) Nonnegativity constraints only. (c) Optimal regularization for the small number of sites. (d) Too strong regularization for the small number of sites. Parameters are given in Table 1.
x2 increases.
TABLE 1
Fitting of Data Generated from Four=Site Model for the Small Number of Sites Regularizing Function for Increasing Regularization Parameter A figure log A sites y/At pb ff Ib
IC
Id
3.0 3.8 4.0 5.0 7.0 9.0
7 6 5 4 3 2 1
0.74 0.74 0.74 1.06 1.10 25.0 26.3
0.85 0.85 0.85 0.61 0.26 0.00" 0.00"
2.76 2.76 2.75 2.31 1.78 0.13" 0.14'
Resulting mean square deviation. Probabilitypforthe test of runs. Durbin-Watson parameter d. Asterisked (+) values indicate statistically significant trends in the residuals. a
These problems can be corrected by applying the small number of sites regularizing function (eqs 9 and 12). As illustrated in Table 1, the increase of the regularization parameter 1 causes the number of sites to decrease while
Judging from the 0.95 probability limits of x 2 / N given above, one immediately concludes that only the fits based on the one- and two-site spectrum are significantly worse that all other fits. Therefore, we select the three-site spectrum (showninFigure IC)with thelargest x 2 / N= 1.10, which is still statisticallylikely, as the optimal (orbest) small number of sites distribution. Since the errors are known, we have also applied solution eq 11, which directly leads to the same optimal three-site distribution as shown in Figure IC. In practice, however, accurate estimates of the errors are not available, and alternative methods must be used to select the optimal distribution. One such qualitative test is immediately apparent from plots of the residuals as shown in Figure 1. One observes that for three sites the residuals show no significant trend (Figure IC)while there is a clear trend for one site (Figure Id). The presence of trends can be investigated in a quantitative fashion by calculating the probability p for the test for runs and the parameter d of the Durbin-Watson test (seeprevious section). The results VOL. 29, NO. 2, 1995 / ENVIRONMENTAL SCIENCE & TECHNOLOGY 1417
SMOOTHNESS
SMALL NUMBER OF SITES
0.4
Q-1a
0
-2-3
-r
(a)
1
RSD =
eO.6 -
I,
0.0
1
-
1
Q
a-l-
8
0
0.3 -
4
-2 1
-8
1
-6
-4
-2
-log K
0.0
-f
-8
-6
-4
-2
-log K
FIGURE 2 Optimal affinity distributions as a function of the common logarithm (basis 10) of affinity constants regularized for small number of sites (left)and for ssootbness (right) for generated data for the four-site spectrum shown in Figure l a for increasing error levels (top down). Values of relative standard deviation (RSD) are given in the figure. The original spectrum can be recovered from the highly accurate data set only.
are shown in Table 1. On a probability level of 0.05, the residuals are correlated if p < 0.05 for the test of runs. On the same probability level, there is no correlation between the residuals if d > 1.49 while for d < 1.35 this hypothesis must be rejected. On the basis of both tests, we conclude that the three-site spectrum is the optimal distribution. Note that the same conclusion was reached above based on theX2value. The basic difference in the two approaches is that the tests based on trends in the residuals do not require reliable estimates of the standard deviations of the experimental data, while the xZ test does depend critically on the accuracy of these estimates. EffectofMagnitudeofErrorsand Number ofData Points. With the exception of a few common features like average position and width, the optimal affinity distribution shown in Figure IC does not really look much like the original four-site distribution shown in Figure la. This means that the original distribution cannot be recovered from the data with an error of RSD 0.02. One wonders how accurate the data would have to be in order to recover the true distribution. Figure 2 illustrates this aspect. On the left 418 1 ENVIRONMENTAL SCIENCE &TECHNOLOGY / VOL. 29, NO. 2, 1995
side, we have shown the optimal small number of sites solutions for four different error levels, with RSD of 10-l2, 0.02, and 0.1. One observes that the original distribution can be recovered from the extremely accurate data only. As the error increases, the number of sites in the optimal spectrum decreases. The effect of the smoothness regularizing function applied to the same data is also illustrated in Figure 2 (right side). The general features of the original spectrum can still be recovered from the data containing error. At larger noise level, the details of the original distribution are completely lost. Further calculations indicate that increasing the number of experimental data points does not alter the situation substantially. Stability of Regularized Solutions. Another unpleasant feature of many solution schemes is that minor changes in the experimental data produce drastic changes in the resulting affinity distribution. We illustrate how regularization techniques resolve this problem by discussing the analysis of several different realizations of the same isotherm with RSD of 0.02 which are analogous to “repetitions” of
0.6
UlLL
8 0-3
-2
-3
0.0
-0
-6
-4
-2
-0
-6
-4
-2
-8
-6 -4 -log K
-2
-8
-6 -4 -log K
-2
-8
-6 -4 -log K
-2
FIGURE 3. Resulting affinity distributions from two different realizations ("repetition of the experiment") with RSD of 0.02. (a) Nonnegativity constraint only. (b) Regularization for small number of sites. (cl Regularization for smoothness.
the experiment (results from the first realization were already used above in Figures 1 and 2). Calculated distributions from two additional realizations with nonnegativity constraints but without additional regularizing functions are shown in Figure 3a. As one can see, the displayed distributions have an entirely different appearance even though they were obtained from very similar experimentaldata. In other words, the inversion procedure is unstable. The optimal distributions which originate from the application of small number of sites and smoothness regularizingfunctions are shown in Figure 3b,c,respectively. The resulting distributions are now very similar for both realizations of the same isotherm. The use of the regularizing function is therefore very important in order to provide the necessary stability of the inversion procedure. Only affinity spectra obtained from such stable fitting schemes can be employed to compare results on numerous materials from different laboratories in a meaningful fashion. Influence of Grid Spacing. We have also investigated the possibility of fit generated data with increasing values of the grid spacing A, We have observed that independent of the error level (RSD between 0.001 and 0.1) the grid spacing A = 0.5 was sufficient to obtain an acceptable fit for this data. Note that the combination of such widely spaced grids and regularization schemes may provide a simple way for the comparison of different materials. Sensitivity on the Model of the Errors. How sensitive is the resulting optimal distribution to incorrect assumptions about the errors in the data? The following example was examined to address this point. Data were generated with constant relative error on both concentration in solution cand adsorbed amount q. Then we determined the optimal affinity spectrum in two different ways. First, we used the correct model for the errors (Le., assumed known constant relative error in bothvariables cand q). Second, by applying the test of runs and the Durbin-Watson test, we determined the optimal affinity spectrum assuming another model for the errors, namely, no error in c (Oc,c,i= 01, and a constant but unknown relative error in q. In spite of the fact that the second model for the errors is incorrect, the determined spectra were again very similar to the ones obtained by the first approach. Thus, the optimum affinity spectrum does not depend strongly on the details of the error model. This feature is due to the stabilizing effect of the regularizing function, and for this reason, one is tempted to suspect that the optimal affinity distributions are not
very sensitive to the details of the statistical properties of the errors. As statistical properties of the errors are seldom known in detail, this finding simplifies the analysis of real experimental data substantially. Data Generated horn a Continuous Distribution. As an example of a continuous affinity distribution, we have studied the function used by Nederlof et al. (24). These authors investigate linear superposition of two normal distributions in log K centered at log K = 2.0 and log K = 3.5 and a standard deviation of 1.0, which may mimic the affinity distribution of a humic acid, for example. As in the previous example,we study constant relative errors on the concentrationin solution and adsorbed amount. Awindow from log Kmin = - 1 to log Kmax= 6 and a grid spacing A = 0.1 is used. Effect of Magnitude of Errors. We have generated 30 data points corresponding to solution concentrations from up to lo-' mol/L with different error levels, namely, RSD of 10-l2, 0.02, and 0.1. The optimal affinity distributions obtained by applying the small number of sites and the smoothness regularizing functions to these data sets with eq 11 are shown in the left and right sides of Figure 4, respectively. Only the application of the smoothness regularizing function to the extremely accurate data permits us to recover the true affinity distribution it properly. Already, with a noise level as small as is impossible to obtain the distribution quantitatively. From both examples, we conclude that attempts to recover true affinity distributions from experimental records of isotherms (with a typical error level above 0.01) are rather hopeless. Maximum Entropy Method. The data at error level RSD 0.02 were used to illustrate the application of the maximum entropy method. With the prior image being either a Gaussian in log K centered at 3.0 and of width 1.0 or a uniform distribution in log K within the spectral window, we obtain spectra which show isolated peaks in a broad, uniform background. While maximum entropy methods seem useful to recover bright spots (isolated sites) from a continuous background (8),they do not offer any advantages in the present situation.
Analysis of Experimental Data Affinity distributions were determined by applying the methods described above to several sets of published VOL. 29, NO. 2, 1995 I ENVIRONMENTAL SCIENCE & TECHNOLOGY
419
SMALL NUMBER OF SITES
I:%
i
9
2-3LL -5
-6
-4
SMOOTHNESS I
0.0 _
-2
0
-2
I
u
-6
-4
0.4
I
-2
0
-6
0.4 Q
M-l0
-2
0
(b) RSD
{
I
-4
-2
0
(c) RSD =0.02
-
-e 0.2 -
M
-2
-4
I
I
h
.
-6
01
Jy (a) RSD
,
0.0 -
FIGURE 4. Optimal affinity distributions regularized for small number of sites (left) and for smoothness (right) from generated data with a bi-Gaussian model for increasingerror levels (top down). Values of relative standard deviation (RSD) are given in the figure. The original spectrum can be recovered only from the highly accurate data set by regularizing for smoothness (top right).
experimental data. The data points were obtained by digitizing the published figures with a high resolution scanner. Sorptionof Cations with the Exception of Protons. We focus first on experimental sorption data of cations where the concentration in solution and on the sorbent was measured over a wide range. We assume no error in the concentration in solution and constant relative error of the adsorbed concentration (g,,i = 0 and 0q.j = qi). The assumption of constant relative error is certainly realistic for actual experimental data and is essential to achieve a proper weighting of the data points over wide concentration ranges. As we have shown in the previous section, this simple model for the errors together with a regularizing function appears to recover the proper optimal affinity distributions, even if the concentration in solution is subject to experimental error as well. As a general measure of the goodness of fit we employ
420
ENVIRONMENTAL SCIENCE &TECHNOLOGY / VOL. 29, NO. 2, 1995
which is the mean of the absolute values of the relative deviations. This error measure is less sensitive to infrequent outliers than x2 used for data fitting. (However, arelis not suitable as an objective function as its derivatives are discontinuous.) Ca2+on Iron Hydroxide (Kinniburgh et al. (25)). Data for sorption of Caz+ on iron hydroxide at pH 8.0 in 1.0 mol/LKN03 (Kinniburghet al. (2.9,theirFigure2) are shown here in Figure 5a. Kinniburgh et al. (25) have fitted the data by splitting the isotherm into three Freundlich-like regions. The fit of these data subject to the nonnegativity constraints (cf. eq 7 ) ,no constant or linear isotherm (SO= Kin = 0 in eq 15) and with a spectral window from log Kmin = 1 to log K,, = 7 is compared with the data points in Figure 5a. The corresponding spectrum is shown in Figure 5b. One peak lies almost on the border of the window at log K= 1. This peak corresponds in fact to alinear isotherm, and its position depends sensitively on the position of the right border of the window. Therefore, the fit is performed again by allowing for a constant and a linear isotherm. One
TABLE 2
Summary Fitted Parameters of Affinity Spectra for Experimental Data Sets authors
figure
Kinniburgh et al. (25)
5b 5c 6b 6c
Brownawell et ai. (26) Perdue et al. (27) Contescu et al. (28)
0.0841 0.0177 0.0142 0.0190 0.0238 0.0157 0.122 mmol/g 0.1 46 mmol/g 0.440 0.445 1.49 mmollg 1.84 mmol/g
7b 7c 8b 8c 9b 9b
a
%I
aThese values correspond to aab.for acid-base titration data and to
Kli"
(4.1
A
0,039
0.1 0.1 0.1 0.1 1.o 1.o 0.1 0.1 0.1 0.1 0.1 0.1
0.039 0.058 0.058 0.047 0.058 0.040 0.039 0.008 0.008 0.010 0.010
0.634 Umol 0.950 Umol 0.715 Umol 0.460 Umol 0.740 Umol 10.0 m u g 8.56 m u g 0.371 0.374 -0.578 mmol/g -0.778 mmol/g
aa
214 m u g 162 m u g
in other cases.
TABLE 3
Fitting Data of Kinniburgh et al. (25)with Increasing Power of the Small Number of Sites Regularizing Function sites
6 5 4 3 2 1
&.I*
0.039 0.039 0.039 0.041 0.059 0.115
pb
ff
0.93 0.68 0.91 0.68 0.13 0.01*
2.47 2.39 2.38 2.22 1.74 0.52*
-l1 -3-
a Resultingdeviationcq.,. Probabilitypforthetestof runs. DurbinWatson parameter d. Asterisked (*) values indicate statistically significant trends in the residuals.
obtains the spectrum shown in Figure 512,which includes a linear isotherm but the constant being zero (see Table 2). Now the right-most peak has vanished, and alinear isotherm appears which describes the effect of all low-affinity sites outside the window. One also observes that the spectra shown in Figure 5b,c have quite different appearances, which means that the fitting procedure is unstable. This feature was already discussed with the model data, and we have observed that the stability can be restored with a regularizing function (cf. eq 12). The effect of increasing power of the regularization for a small number of sites is illustrated in Table 3. By increasingthe parameterl, the number of sites decreases and the mean deviation arelincreases. Note, however, that a substantial increase of the mean deviation happens only if we decrease the number of sites from two to one. So one expects that two sites will still provide a good description of the data. This can be confirmed by applying the tests discussed earlier. In Table 3, we report the probability p for the test of runs and the parameter d of the DurbinWatson test. For the test of runs, random arrangement is unlikely if p < 0.05 on this probability level. On the same probability level, we have no correlation between the residuals if d > 1.41 while for d < 1.22 correlation is likely to be present. Therefore, only the fit which involves a single site is significantlyworse than all other fits, and the optimal small number of sites spectrum (shown in Figure 6b) consists of two sites and a linear isotherm (see also Table 2). The resulting fit shown in Figure 6a is indeed as good as the fit without the regularizingfunction shown in Figure 5a. Using the same approach, we find the optimal smooth distribution by applying the smoothness regularizingfunc-
V I
-0
I
I
I
-4 -2 log c (mol Ca/L) -6
O1 Q
an -20
d
-4
I
-0
-6
1
I
-4
-2
-log K
Ol
-0
-6
-4
-2
-log K FIGURE 5. Sorption isotherm of Ca*+ on iron hydroxide. (a) Experimental data points from Kinniburgh et at. (29and the virtually indistinguishable calculated isotherms subject to nonnegetivity constraintsonly. (b) Resulting affinity distribution without invoking a linear isotherm for fitting. IC) Resulting affinity distribution with an additional linear isotherm. Note that in spite of the different appearance of both spectra the fit is perfect. VOL. 29, NO. 2. 1995 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
421
n
n
-I1
bo
&
L
-1
8
v
e -3 eo 0
4
-5 I
-8
-5 I
I
I
-6
-4
-2
- 0
-6
-2
log c (mol/L)
log c (mol Ca/L)
O1 Q
- 1
O l
eo 0
-2 -3
o’8
-
-2
I
-8
-6
-4 -log K
-2
1
-log K
0.4
I
0.0 0.0
-0
-6
-4 -log K
-2
FIGURE 6. Sorption isotherm of CaZ+ on iron hydroxide. (a) Experimental data points from Kinniburgh et al. (25) with calculated isotherms obtained for two different regularizing functions. Corresponding optinialeffinity spectra as a function of common logarithm (basis 10) of the affinity constant for (b) small number of sites and (c) smoothness (left peak is magnified by a factor of 100).
tion. The result, which also involves alinear isotherm (see Table 2), is shown in Figure 6c. The quality of the fit (see Figure 6a) is essentially identical to the one which is based on the two-site spectrum. Often, one may be interested in describing the isotherm with a discrete but coarse spectrum. The present (and other) experimental data sets can be described in an adequate fashion with a grid spacing of A = 1.0. The fit subject to nonnegativity constraintswithout any additional regularizing functions leads to a five-site spectrum and a linear isotherm (see Table 2). Regularizing for the small number of sites, we can reduce the number of sites down to three and a linear isotherm without loss of quality of the fit (see Table 2). The resulting three sites with log Kof 3, 4, and 5 have relative concentrations 0 of 0.945,0.029, and 0.026,respectively. The interesting aspect of this approach lies in the conciseness of the quantitative description of a heterogeneous sorbent. 422
1
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 29, NO. 2,1995
4-
-10
-6
-2
-log K FIGURE 7. Sorption isotherm of dodecylpyridiniumon a soil material (EPA-12). (a) Experimental data points from Brownawell et ai. (2s) and virtually indistinguishable calculated isathems based on optimal affinity distributionobtained from different regularizations. (b) Small number of sites. (c) Smoothness (left and middle peaks are magnified by a factor of 30).
Cationic Surfactant on Soil (Brownawell et al. (26)). Sorption of an organic surfactant (dodecylpyridinium) in 0.01 M NaCl on a soil was studied by Brownawell et al. (26) (data labeled by EPA-12 in their Figure l),and their data are reproduced here in Figure 7. Best fit involving nonnegativity constraints but no additional regularization involves seven sites and a linear isotherm with a deviation a,,l= 0.028. Optimal regularization for the small number of sites reduces this number to four sites and to linear isotherm (see Figure 7b and Table 2). Regularization for smoothness results in a three-peak distribution and alinear isotherm (see Figure 7c and Table 2). Acid-Base Titration Data Consider now the analysis of acid-base titration data, which can be viewed as proton sorption isotherms. In some respects, the situation is somewhat special and requires a slightlydifferent treatment. The data are often reported as the difference between adsorbed H+ and OH- (often loosely referred to as “surface charge density”) as a function of the H+ ion activity or
concentration in solution c (or rather as a function of pH = -log c). Frequently, the initial protonation state of the sorbent is not known; the conversionof the data to aproton sorption isotherm then requires an additive constant. Depending on the point of zero charge, this constant can also be negative. For this reason, we have fitted the constant so for all data sets without invoking the nonnegativity constraint on this parameter. Since the experimental values of adsorbed amount q usually span a rather narrow range, we assume no error in pH and a constant absolute error of the adsorbed concentration (uc,i = 0 and u,,i = constant). We expect no substantial change of the resulting distributions if more complicated models for the errors are considered. As a general measure of the goodness of fit, we employ a normalized mean of the absolute values of the absolute deviations, namely (21) where qmm and qmin are the maximum and minimum value of the experimental data points of adsorbed amount, respectively. Another peculiarity of acid-base (and other) titration experiments is the exceptional smoothness of the data, when results from only a single titration are reported. In such a case, errors associated with solution preparation, temperature control, electrode drifts, and their calibration, etc. are common to either a substantial part or the entire titration curve. Experimental errors would become more noticeable if data from replicate titrations were reported. The trouble with much smooth data which probably originate from correlated errors is that statistical tests detec; nonrandomness among the residuals even for extremely good fits of the data, and the application of such a test to a single titration curve is essentiallymeaningless. The best way to decide the optimal regularization is to investigate how the mean deviation a a b s (or x2) increases as a function of the regularization parameter A. At first, aabs remains constant, then increases gradually, and finally starts to increase rapidly. The region between gradual and rapid rise marks the optimalvalue of the regularization parameter A. Nevertheless, this procedure is less straightforward than the above approach based on the test of runs and the Durbin-Watson test. Aquatic Humus (Perdue et al. (27)). The titration curve of the Satilla River aquatic humus reported by Perdue et al. (27) (data points in their Figure 4) is reproduced here in Figure 8a. We have omitted the last point for the data fitting which seems to have an exceptionally large experimental error. The fit with nonnegativity constraints but no additional regularization involves 11 sites, which leads to a deviation of a a b s = 0.002. (Note that the constant SO is also fitted.) Regularization for the small number of sites yields essentially the same quality of fit with five sites (see Figure 8b and Table 2). Regularization for smoothnessleads to a broad double-peak distribution similar to the one proposed originally by Perdue et al. (27). Aluminum Oxide (Contescu et al. 128)). As a final example,consider the recent titration data of an aluminum oxide at an ionic strength of 0.01 M discussed by Contescu et al. (their Figure 12, data labeled as 0.01 N) (28). The data are reproduced here in Figure 9 where, for reasons of clarity, we plot every third point only. The authors have applied
0.8
t y
0.6
0 log c (mol H/L) 0 Q
M 0
-1
d
-2
1
-13
-5
-9
-1
-log K
OS3
e Oe2 0.1 V.V
1
i I
-13
N \ I
I
-9 -5 -log K
I
-1
FIGURE 8. Acid-base titration of an aquatic humus as a function of common logarithm (basis 10) of the proton concentration. (a) Experimental date points from Perdue et al. (27) with calculated isotherms based on optimal affinity distribution obtained from different regularizing functions. (b) Small number of sites. (c) Smoothness.
an alternative procedure to determine a continuous affinity distribution (see Figure 14 in ref 281, which however turns out to be negative in some regions. The affinity distribution determined with the nonnegativity constraints only leads to nine sites and a average deviation of a a b s = 0.006. Regularization for the small number of sites leads to four sites and a linear isotherm (see Figure 9b and Table 2). Regularization for smoothness leads to an affinity distribution, which consists of three broad peaks (see Figure 9c and Table 2). Except at very low pH values, our result is quite similar to the affinity distribution reported by Contescu et al. (28). The main difference is that we represent the increase at low pH with a linear isotherm. This example also illustrates the limits of the present affinity distribution approach. From Figure 9a one observe that the fit around pH 4-6 has a systematic deviation from the experimental datapoints. In this region, the inequality 3 is violated, and the experimental curve cannot be VOL. 29, NO. 2, 1995 /ENVIRONMENTAL SCIENCE & TECHNOLOGY
425
1.2 n
\“ 0.6 4
0
-0.6
I
-12
I
I
I
-9
-6
-3
log c (mol H/L) 0
(b)
Q
M 0
-1
4
-2
-12
0.4
1
-6
-9
-3
-log K
-log
K
FIGURE 9. Acid-base titration of alumina. (a) Experimental data points from Contescu at al. (24with calculated isotherm based on optimal affinity distribution obtained from different regularizing functions. (b)Sinall number ofsites. (c)Smoothness. Thasystenmtic deviations around pH 4-6 originates from the too large stope of the isotherm (cf. eq 3).
represented by any linear combination of Langmuir isotherms. Aside from some special cases, however, this situation is not very common, and we have successfully applied the present methods to many other sorption data sets.
Conclusion We have shown how constrained least-squares regularization techniques can be used to determine affinity distributions (affinity spectra) from experimental sorption isotherms on heterogeneous environmental materials. One observes that the calculation of the affinity distribution is an ill-posed problem, which means that an equally good fit of the data points can be achieved with many different distributions. This problem can be circumvented by simultaneous minimization of both the sum of squares and a regularizing function subject to nonnegativity constraints for the site concentrations. Our regularization procedures select for two different affinity distributions, either with a 424
ENVIRONMENTAL SCIENCE &TECHNOLOGY / VOL. 29. NO. 2,1995
small number of sites or a smooth function. Furthermore, the regularization introduces a further stabilization of the optimization problem, such that one obtains very similar affinity distributions for similar experimental data sets. Methods that do not rely on such regularization schemes are plagued by severe instabilities and oscillations, which make any comparison of the results impossible. This type of analysis permits us to extract information on the affinity distributionwithin a certain experimentalwindowof affinity constants. To retain the stability of the affinitydistribution, the effect of the sites which lie outside this window must be represented as a combination of a constant term and a linear isotherm without saturation. The overall performance and stability of the solution scheme depends critically on the judicious choice of the regularizing function. Regularization for smoothness (cf. eq 81,which was used by many other authors in different contexts (8, 12, 18), works almost optimally and also provides the necessary stability for the calculation of the affinity spectrum. Our approach for the regularization for the small number of sites (cf.eq 9) works satisfactorily,but it may be possible to improve the procedure by a better choice of the regularizing function. Concerning the maximum entropy regularizing function, we entirely agree with the critical view of Press et al. (8);this regularization method offers, beside its “magic”, no advantages in the present context. One must realize that without additional, detailed information about the system at hand attempts to recover the true affinity distribution from an experimental record of the isotherm appear rather hopeless (29). In this case, the present approach remains a data fitting excercise but offers an empirical and very promising method for the parametrization and comparison sorption behavior of different heterogeneous materials. Regularizationfor small number of sites represents a simple and stable description for most sorption isotherms that is easily incorporated into existing speciation and transport codes. Combination of regularization together with larger grid spacing offers novel possibilitiesfor comparison materials from different sources.
Acknowledgments We thank P. Schurtenberger for an essential discussion. This research has been supported by ETH-Zurich, Swiss National Science Foundation, and U.S. Department of Energy (DOE) under Contract DE-AC06-76RLO as part of DOE’S Subsurface Science Program.
Appendix The numerical solution of optimization eqs 7, 11, and 12 has been discussed in recent numerical mathematics literature (9). “Toecho the advice of Chambers (30), we do not recommed the writing of one’s own optimization software as a casual project. It is an undertaking which requires a substantial investment because small implementation details can have a large effect on both efficiency and reliability”(31). Particularly,this applies if one attempts to solve ill-posed or constrained optimization problems and it is essential to employ mature codes tested on benchmark problems. We use the Fortran implementation of the NAG library (10) on VAX 9000 and PC. Optimization eqs 7, 11, and 12 were solved with the NAG-routine EO4UCF, which minimizes a nonlinear function subject to a mixture of nonlinear and linear constraints
together with bounds on variables (e.g. nonnegativity constraints). In the case of nonlinear constraints and nonnegativity constraints (eq 111, memory and CPU requirements of this routine are much larger than in the case of nonnegativity constraints only (eqs 7 and 12). Equation 12 was thus used for the solution of most problems. The special case of uC,i= 0 with either no regularization or the smoothness regularizingfunction can be solved in a much more efficient manner with the NAGroutine EOQNCF,which minimizes a sum of squares of linear functions subject to nonnegativity constraints.
Literature Cited (1) Langmuir, I. I. Am. Chem. SOC. 1918, 40, 1361-1403. (2) Pauling, L.; Pressman, D.; Grossberg, A. L. I. Am. Chem. SOC. 1944, 66, 784-792. (3) Sips, R. J. I. Chem. Phys. 1948, 16, 490-495. (4) Rudzinski,W.; Everett, D. H. Adsorption o f h e s o n Heterogeneous Surfaces; Academic: New York, 1992; Chapter 11. (5) Dzombak, D. A.; Fish, W.; Morel, F. M. M. Environ. Sci. Technol. 1986, 20, 669-675. (6) Koopal, L. K.; Vos, C. H. W. Langmuir 1993, 9, 2593-2605. (7) Tikhonov, A. N.; Goncharsky, A. V. Ill-posed problems in the Natural Sciences; Mir: Moscow, 1987. (8)Press, W. H.; Teukolsky, S. A.; Vetterling, W. T.; Flannery, B. P.
(9) (10) (1 1)
(12) (13)
Numerical recipes in FORTRAN, The art of scientific computing, 2nd ed.; Cambridge University Press: Cambridge, 1992. Gill, P. E.; Murray, W.; Wright, M. H.Practical Optimizations; Academic: New York, 1981. NAG Fortran Library,Mark 15, Numerical Algorithm Group Ltd, Wilkinson House, Jordan Hill Road, Oxford, UK OX2 8DR. Smith, R. C., Grandy,W. T., Eds. Maximum-Entrowand Bayesian Methods in Inverse Problem; Reidel: Dortrecht, 1985. Provencher, S. W. Comput. Phys. Commun. 1982,27,213-227. Potton, J. A.; Daniell, G. J.; Rainford, B. D. I. Appl. Crystallogr. 1988, 21, 663-668.
(14) Stanley, B.I.; Bialkowski, S. E.; Marshall, D. B.Ana2.Chem. 1993, 65, 259-267. (151 Stanley, B.J.; Guiochon, G.J. Phys. Chem. 1993,97,8098-8104. (16) Brown, L. F.; Travis, B. J. In Fundamentals ofAdsorption;Myers, A. La,Belfort, G.,Eds.; New York Engineering Foundation: New York, 1984. (171 Draper, N. R.; Smith,H. Applied Regression Analysis; John Wiley & Sons: New York, 1981. (18) Phillips, D. L.I, ACM 1962, 9, 84-97. (19) Leuenberger, B.; Schindler, P. W. Anal. Chem. 1986,58, 14711474. (20) Brassard, P.; Kramer, J, R.; Collins, P. V. Environ. Sci. Technol. 1990,24, 195-201. (211 Swed, F. S.; Eisenhard, C. Ann. Math. Stat. 1943, 14, 66-87. (22) Durbin, J.; Watson, G. S. Biomem’ka 1951, 38, 159-178. (23) Gamble, D. S.; Langford, C. H. Environ. Sci. Technol. 1988,22, 1325-1336. (24) Nederlof, M. M.; Van Riemsdijk, W. H.; Koopal, L. K. Environ. Sci. Technol. 1992, 26, 763-771. (25) Kinniburgh, D. G.; Jackson, M. L. Soil Sci. SOC.Am. I. 1982,46, 56-61. (26) Brownawell, B. J.; Chen, H.; Collier, J. M.; Westall, J. C. Environ. Sci. Technol. 1990, 24, 1234-1241. (27) Perdue, E. M.; Lytle, C. R. Environ. Sci. Technol. 1983, 17,654660. (28) Contescu, C.; Jagiello,J.; Schwarz, J. A. Langmuir 1993,9,17541765. (29) Borkovec, M.; Koper, G. J. M. Langmuir 1994, 10, 2863-2865. (30) Chambers, J. M. Biornetriku 1973, 60, 1-13. (31) Seber, G. A. F.; Wild, C. J. Nonlinear Regression; John Wiley: New York, 1989.
Received for review April 21, 1994. Revised manuscript received October 24, 1994. Accepted October 25, 1994.@ ES940244H
@Abstractpublished in Advance ACSAbstracts, December 1,1994.
VOL. 29. NO. 2, 1995 / ENVIRONMENTAL SCIENCE &TECHNOLOGY
425