Environ. Sci. Technol. 1996, 30, 2586-2590
Quantifying Relationships between Near-Infrared Reflectance Spectra of Lake Sediments and Water Chemistry M A T S B . N I L S S O N , * ,† E I G I L D Å B A K K , ‡ TOM KORSMAN,§ AND INGEMAR RENBERG§ Department of Forest Ecology, SLU, S-901 83 Umea˚, Sweden, and Department of Organic Chemistry and Department of Environmental Health, Umea˚ University, S-901 87 Umea˚, Sweden
One of the most useful approaches to long-term monitoring of aquatic systems is the analysis of lake sediments. Biological indicators, such as diatoms, preserved in the sediments are widely used. We suggest that use of near-infrared reflectance (NIR) spectroscopy of lake sediments could become a rapid and cost-effective technique for environmental monitoring to follow long-term changes in water quality. NIR spectra of surface sediments from Swedish lakes were used to establish relationships between sediment properties and measured lake water chemistry. Predictive models for inferring total phosphorus (TP), pH, and total organic carbon (TOC) from sediment NIR data were developed using partial least squares regression. The model for inferring lake water TP (n ) 33 lakes) captured 83% of the variance, while the explained variance for pH (n ) 52 lakes) and TOC (n ) 25 lakes) was 85 and 68%, respectively. We also used the TP model to evaluate the effect of inaccuracy in measured lake water chemistry for the model performance, i.e., the amount of explained variance. The inaccuracy in measured lake water chemistry corresponds to 10.5% of the total variance in the model. The highest possible variance to model then being 89.5%. This evaluation indicated that the obtained modeled variance almost equaled the variance possible to model, which suggests that further improvement of the models should be focused on enlargement of the calibration data set to include more lake types.
Introduction Near-infrared reflectance (NIR) spectroscopy is a rapid, reagent-free, non-destructive technique that has become * Corresponding author telephone: +46 90 166370-; fax: +46 90 167750; e-mail address:
[email protected]. † Department of Forest Ecology. ‡ Department of Organic Chemistry. § Department of Environmental Health.
2586
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 30, NO. 8, 1996
increasingly important in industry for process monitoring and product quality control (1). Although NIR spectroscopy has been applied in a few ecological and limnological studies (2-5), the NIR technique is still largely overlooked in environmental contexts (6). In a previous paper (2), we explored the potential of NIR spectroscopical data of lake sediment cores for inferring past pH conditions of lakes, and here we examine the possibility that NIR analysis of surface-sediment samples from lakes could become a useful technique in environmental monitoring, in particular for following long-term changes in water quality. This suggestion is based on the following assumptions: (i) properties of sediments formed in any lake depend largely on the properties of the lake water, such as pH and nutrient conditions; (ii) properties of the sediments can be recorded by NIR spectroscopy; and (iii) water quality properties can be inferred from lake sediments using transfer functions obtained from modeling of NIR spectra of surface sediments and lake water chemistry variables in a calibration data set. Provided that reliable models between sediment NIR spectra and lake water chemistry data can be obtained, NIR analyses of surface-sediment samples taken from lakes, for example, every 5th year, could be used to establish a time series of water-chemistry parameters. Such NIRinferred water-chemistry values would be lake basin integrated mean values for a few years and could provide cost-effective complementary or alternative data to those obtained by conventional water-chemistry analyses. Waterchemistry variables usually have considerable seasonal and annual variability. To obtain reliable monitoring series of water-chemistry parameters, several water samples each year are usually required. With the NIR approach, one sediment sample could substitute for a large number of water samples. NIR-based water quality monitoring could become particularly useful for lakes in remote areas that are difficult to access for conventional water sampling. We test here the hypothesis that the lake water concentration of total phosphorous (TP) and total organic carbon (TOC) as well as lake water pH are reflected by lake sediments and can be recorded by NIR spectroscopy. To test the hypothesis, we used a data set of surface sediments and lake water chemistry.
Materials and Methods Surface-Sediment Samples and Water Chemistry. To assess relationships between NIR spectra of lake sediments and lake water TP, pH, and TOC, a data set consisting of surface-sediment samples and water-chemistry measurements was collected. Surface-sediment samples (0-1 cm) were taken in 1993 from 58 lakes using a HON-Kajak gravity corer (7). Depending on sediment accumulation rate, 1 cm of surface sediment represents between 2 and 5 years in the lakes included in the calibration data set. Typically, the lakes were 10-30 ha, 5-20 m deep, headwater lakes from southern Sweden, spanning a range from severely acidified to eutrophic lakes (Table 1). Triplicate sediment cores were taken from the deepest point of each lake, and the samples were frozen in the field immediately after coring. Water-chemistry data were obtained from both national (The Swedish Environmental Protection Agency) and
S0013-936X(95)00953-9 CCC: $12.00
1996 American Chemical Society
TABLE 1
Statistics for Three Lake Water Chemistry Variables (TP, pH, and TOC) Modeled by NIR Spectroscopy of Lake Sediments L-1)a
TP10 (µg TP4 (µg L-1)a pH TOC (mg L-1)
n
mean
SD
median
min
max
33 33 52 25
37.3 38.8 6.6 9.8
29.7 29.8 0.9 5.7
23.0 25.8 6.6 9.9
6 7 4.6 1.6
98 98 8.4 21
a TP average values for 10 years (1983-1992), TP , average values 10 4 for 4 years (1989-1992).
regional lake water monitoring programs. Analyses were conducted using standard methods (8). Mean values for water chemistry were used for modeling. For calculation of these arithmetic mean values of TP and pH, only lakes with >4 analyzed water samples were accepted and >3 for TOC. With these restrictions, TP data were available from 39 lakes (range 6-98 µg L-1), pH from 58 (pH 4.6-8.4), and TOC from 25 lakes (1.6-21 mg L-1) (Table 1). Mean values for water chemistry were based on measurements, sampled all over the year, from 1983 to 1992. Because there were temporal trends in lake water TP in some lakes, a model with TP means for 1989-1992 (TP4) was also tested. Data on more than one water-chemistry variable from the same lake were only available from a restricted number of lakes. This made it difficult to evaluate the correlation between these variables, and hence if the NIR models were totally independent. However, using these restricted data sets, the following correlations were obtained: TP/pH, n ) 24, r ) 0.66; TP/TOC, n ) 19, r ) 0.57; pH/TOC, n ) 18, r ) -0.16. Near-Infrared Reflectance (NIR) Spectroscopy. Chemical bonds between light atoms such as C-H, O-H, and N-H generally have high vibrational frequencies and strong vibrational overtones detectable in the near-infrared region. These spectroscopic features make this region suitable for registration of the chemical composition of organic material. A linear relationship between absorbance and concentration, i.e., the Beer/Lambert/Bouguer relationship, is exhibited in the majority of biological and agricultural applications (cf. refs 9 and 10). In the laboratory, the frozen sediment samples were freeze dried and ground in a mortar. NIR spectra were recorded using a NIRSystem 6500 instrument (NIRSystems Inc., Silver Spring, MD). The instrument measures diffuse reflectance that is transformed to absorbance values according to the relationship log (1/reflectance). Data were collected at 4-nm intervals between 400 and 2500 nm, yielding 525 data points. One NIR spectrum was collected from each sediment sample, giving three spectra from each lake. Partial Least Squares (PLS) Regression Modeling. Lake water chemistry measurements were regressed on NIR spectra of surface sediment samples by PLS regression (11) using the Unscrambler 5.5 software, CAMO A/S, Trondheim, Norway (12). From the three original spectra from each lake, an average spectrum was calculated. This was done to facilitate the computations with the Unscrambler software, particularly the cross-validation procedure. If replicates are used, a false improvement of the prediction error is achieved in the cross-validation procedure, unless the replicates are simultaneously removed in the same cross-validation segment.
In PLS regression, the original independent variables (absorbance values in the NIR spectra) are replaced by a few orthogonalized components. This is done by utilizing the covariance matrix between the dependent variable (lake water chemistry) and the independent variables (NIR spectra), thereby solving the problem of highly correlated independent variables. The number of components that yields the model with highest prediction properties, i.e., lowest standard error of prediction (SEP), is determined by internal cross-validation (13). We used seven crossvalidation segments in all computations. The exact number of cross-validation segments is not critical, but using as many cross-validation segments as objects tends to underestimate SEP, i.e., the model appears to be better than it is (cf. ref 11). The standard error of prediction and explained variance for prediction were also determined from this internal cross-validation procedure. The SEP, i.e., the accuracy by which water chemistry of any lake within the data set can be predicted using transfer functions based on the other lakes, expresses the predictive power of a model. SEP was calculated as:
SEP )
x
1
I
∑(y - yˆ )
I - 1 i)1
i
2
(1)
i
where yi is the observed value; yˆi is the predicted value; and I is the number of lakes. Several pretreatment procedures of data were performed to find the model that gave the lowest prediction error of the lake water chemistry. As a first step, outliers were identified and removed from further analysis. Both samples with high leverage and samples with a large distance to a principal component model calculated from the spectral data (i.e., samples with high residual standard deviation in the spectral data compared to the average residual standard deviation) were considered outliers (11). For a more detailed discussion on outlier treatment, see Martens and Naes (11). This reduced the data sets as follows: for TP and for pH-six samples were removed, while for TOC none of the samples were removed. For the modeling of TP and TOC, both log-transformed and original chemical values were used. If transformed values were used in calibration, the predictions were backtransformed for the calculation of SEP. The best predictive models were obtained with non-transformed values of the dependent variable. Inclusion of H+, pH, and TOC or TP as independent variables in order to handle possible interactions between the modeled variable and the other lake water chemistry variables was tested. These additions did not improve any of the models. Origin of Unexplained Variance. The unexplained variance (1 - R2) in the final model emanates both from error related to the measurement of the water-chemistry variable (i.e., how accurate the mean values describe the real water chemistry of the lake) and from the model error. The source of the unexplained variance sets the limits for what is possible to model. The basis for the estimation of the variance possible to model follows below. When calculating a linear regression model
Y ) Xb + f
(2)
it is assumed that all variance belongs to y and that X is error-free. The largest variance in y possible to model is then that around the mean:
VOL. 30, NO. 8, 1996 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
2587
Vartot )
1
I
∑(y - yj)
2
i
I - 1 i)1
(3)
The modeled variance in y can be written as
Varmod )
I
1
∑(yj - yˆ ) I-1
2
i
(4)
i)1
and the residual variance (i.e., 1 - R2) not explained by the model is calculated according to
Varres )
1
I
∑(y - yˆ )
2
I - 1 i)1
i
i
(5)
This unexplained variance can be partitioned according to
Varres ) Varu + Var
(6)
where Varu is the variance in the dependent variable, often called pure error, composed of a repeatability error and a reproducibility error, and Var is the lack of fit or the model error. The lack of fit (Var) indicates systematic variation in the y variable that can be modeled by the addition of appropriate X variables or a larger data set covering all the systematic variation in y. The size of the pure error (Varu) sets the upper limit of the variance possible to model (R2 cf. eq 4). In this study and most studies of biological and related data, the pure error (Varu) is dominated by the repeatability error, which in this study originates from the natural variation in the lake chemistry variables. This variance can be estimated as the squared standard deviation (SD) divided by the number of samples, which is directly related to the standard error (SE), which is a measure of the accuracy with which the y variable is determined. The variance related to the dependent variable was estimated as
Varu )
1
I
∑
Si2
I i)1 ni
(7)
where I is the number of lakes, Si2 is the variance in the dependent variable for lake i, and ni is the number of measurements in lake i. The maximum variance, expressed as percentage, possible to model will then be (1 - (Varu/ Vartot)) × 100. For example, the measurement error in TP4 was estimated, according to the equation above, to 10.5% of the total variance giving a maximum of 89.5% left to model.
Results Phosphorus. The best model of lake water measured TP regressed on NIR spectra of surface-sediment samples was achieved using TP averages from the last 4 years prior to sediment sampling. The model gave R2 ) 0.83, and internal cross-validation prediction gave R2 ) 0.73 (Figure 1a; Table 2). Using 10-year averages for the same samples gave R2 ) 0.80 for the calibration and R2 ) 0.64 from the internal cross-validation procedure. The modeled variance is about the same while the explained variance after prediction is higher using the TP average over the last 4 years. The prediction error measured as SEP for TP4 was 15.5 µg L-1, representing 17% of the total range (7-98 µg L-1) and 52% of the standard deviation. According to eq 7, the variance (Varu) in the measured TP4 was 93 µg L-1,
2588
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 30, NO. 8, 1996
FIGURE 1. Measured versus estimated lake water total phosphorus (TP) (a), pH (b), and total organic carbon (TOC) (c) obtained from near-infrared spectroscopy of surface sediment samples. The TP model was based on TP data from 1989 to 1992.
TABLE 2
Partial Least Square Regression Statistics for Calibration Model and Internal Cross-Validated Predictions lake water variables
n
no. of components
Rest2 a
SEEb
Rpred2 c
SEPd
TP10 (µg L-1) TP4 (µg L-1) pH TOC (mg L-1)
33 33 52 25
6 6 7 4
0.80 0.83 0.85 0.68
13.2 12.1 0.3 3.2
0.64 0.73 0.78 0.40
18.4 15.5 0.4 4.5
a Explained variance in the dependent variable in calibration model. Standard error of estimate, i.e., standard error for the dependent variable in the calibration model. c Explained variance in the dependent variable by internal cross-validation prediction. d Standard error of prediction, i.e., standard error for the dependent variable by internal cross-validation prediction. b
representing 10.5% of the total variance (eq 3), and for TP10 was 40.5 µg L-1, representing 4.6% of the total variance (eq 3). pH. The calibration model between pH and NIR spectra of sediment samples gave R2 ) 0.85 (Figure 1b; Table 2), and internal cross-validation predictions gave R2 ) 0.78. Standard error of prediction is 0.4 pH unit, corresponding to 11% of the range (pH 4.6-8.4) and 47% of the standard deviation. The variation in measured lake water pH during the last 10 years in many lakes is 1 pH unit, representing approximately one-fourth of the range for average pH for the sampled lakes. TOC. The calibration model for TOC is calculated on the smallest data set with only 25 lakes. This model gave the lowest R2 ) 0.68 for calibration and R2 ) 0.40 for prediction (Figure 1c; Table 2). SEP was 4.5 mg L-1, which can be compared with the SD (5.7 mg L-1).
Discussion The three models for the prediction of lake water TP, pH, and TOC from NIR spectra from sediments captured a large fraction of the variance in the three water-chemistry variables. Furthermore, the low correlation coefficients between the lake water variables indicate that the three inference models are separate models, and one model cannot be used to predict all three variables simultaneously. For TP, 83% of the variance was explained, and for pH and TOC, 85% and 68% respectively were explained (Figure 1; Table 2). These results support the hypothesis that lake sediments contain information about phosphorous, pH, and TOC conditions in the water of the lakes in which the sediments are formed. Several biological and chemical mechanisms might be involved in causing such relationships, and there is a challenge for future research to explain what components in the sediments are actually related to the NIR signals. A general explanation for the observed relationships could be that phosphorus, pH, and the organic carbon of lake waters are major factors controlling the composition of the lake biota and the particulate matter of the water, and thus the sediment properties. This explanation is supported by a study in oligotrophic lakes in Ontario, where Malley et al. (5) collected seston on filters, measured these by NIR spectroscopy, and then analyzed C, N, and P by standard wet chemistry methods. Correlations between chemically measured and NIR-predicted values gave very high coefficients of determination, i.e., the seston of the water contained quantitative C-, N-, and P- related information that was measurable with NIR spectroscopy.
FIGURE 2. Measured lake water TP for Lake Vrångsjo1 n from 1983 to 1992.
TP, pH, and TOC in lake waters often vary markedly within and between years. In several lakes included in this study, the measured lake water TP values vary considerably, such as in Vrångsjo¨n, where the within-lake variation covers slightly less than half of the TP range of the total data set (Figure 2). This within-lake variation not only consists of seasonal changes but also in some lakes consists of a temporal trend during the last 10 years. Considering this variability, the models are encouraging. The SEP for phosphorus is 15.5 µg L-1, for pH 0.4 unit, and for TOC it is 4.5 mg L-1. This variation expressed as SEP corresponds to within variations that are normal for many lakes. For pH, a similar SEP value (0.52) was obtained for a previous model of NIR spectra of surface sediments and lake water pH using 21 other lakes from southern Sweden (2). A significant part of the unexplained variance in the models depend on the water-chemistry input data. For the TP4 data set (1989-1992) the water-chemistry variability accounts for 10.5% of the total variance. The modeled variance (Rest2) is 93%, and the predicted variance (Rpred2) is 82% of what is possible to model. It is likely that with an increased number of measurements of TP for the lakes used to establish the calibration model, the models might be improved slightly. Surface water acidification and eutrophication problems in Europe and North America have called for methods to reconstruct baseline conditions and long-term temporal trends in pH and TP. In particular, methods to infer past pH conditions using fossil diatoms in lake sediments have been given considerable attention over the last 20 years (14, 15), and the development of transfer functions for diatom-based TP reconstruction is in progress in several laboratories. These diatom and water-chemistry models can be used to assess the performance of the NIR approach. The best diatom-based transfer functions for inference of lake water pH have prediction errors of about 0.3 pH unit (16, 17). The diatom transfer functions for TP inferences are based on log-transformed data, so to facilitate comparison between the diatom and the NIR approach, we developed a TP inference model from log-transformed data
VOL. 30, NO. 8, 1996 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
2589
(4-year average). This NIR model has slightly poorer predictive abilities (R2 ) 0.78) than the NIR model based on non-transformed TP data (R2 ) 0.83). The standard error of estimate (SEE) for this model was 0.179, which is comparable with the apparent root mean square errors of prediction ranging from 0.154 to 0.47 obtained from diatombased models (18-22). The cross-validated SEP for the log-TP model was 0.20, which can be compared with diatom-based models with bootstrapped root mean square errors of prediction of 0.245-0.48 (20-22). It is likely that fine-tuning of diatom-based methods can lead to small improvements of the predictive power (23). To what extent the NIR approach also can be improved can only be assessed by analysis of a larger calibration data set that covers a wider environmental gradient. This could include wider TP, pH and TOC ranges, but more importantly, it must cover all types of lakes. Like all regression models, the present transfer functions are only applicable to lakes of the types included in the data set.
Conclusions Lake sediments are archives that reflect biological and chemical conditions of the lakes. A variety of paleolimnological techniques have been developed over the last decades to decipher these archives in order to infer past and contemporary environmental conditions from the sediment records. We believe that spectroscopic methods, which so far have a limited use in paleolimnology, have a great potential and suggest that NIR spectroscopy of lake sediments could become an alternative method for longterm monitoring of lake water chemistry. We demonstrate that information about lake water TP, pH, and TOC can be inferred from surface-sediment samples using transfer functions obtained from the modeling of NIR spectra of surface sediments and water-chemistry variables in a calibration data set. By this work, the baseline for further work is established for assessing how useful a water-quality monitoring system based on NIR analyses of surfacesediment samples could become.
Acknowledgments We thank Erik Renberg, Stefan Henriksson, and Daniel Jonsson for help with the field and laboratory work; Bo Ranneby and Paul Geladi for discussion about the statistic evaluation; and John Anderson and two anonymous reviewers for valuable comments on the manuscript. We
2590
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 30, NO. 8, 1996
are grateful for the help with the selection of lakes provided by County Administrative Boards and Municipalities and for the financial support by the Swedish Environmental Protection Agency.
Literature Cited (1) Hildrum, K. I.; Isaksson, T.; Naes, T.; Tandberg, A., Eds. Near infra-red spectroscopy: Bridging the gap between data analysis and NIR applications; Ellis Horwood; London, 1992. (2) Korsman, T.; Nilsson, M.; O ¨ hman, J.; Renberg, I. Environ. Sci. Technol. 1992, 26, 2122-2126. (3) Nilsson, M.; Korsman, T.; Nordgren, A.; Palmborg, C.; Renberg, I.; O ¨ hman, J. In Near infra-red spectroscopy: Bridging the gap between data analysis and NIR applications; Hildrum, K. I., Isaksson, T., Naes, T., Tandberg, A., Eds.; Ellis Horwood: London, 1992; pp 229-234. (4) Nilsson, M.; Elmqvist, T.; Carlsson, U. Phytopathology 1994, 84, 764-770. (5) Malley, D. F.; Williams, P. C.; Stainton, M. P.; Hauser, B. W. Can. J. Fish. Aquat. Sci. 1993, 50, 1779-1785. (6) Malley, D. F.; Nilsson, M. Spectrosc. Eur. 1995, 6, 8-16. (7) Renberg, I. J. Paleolim. 1991, 6, 167-170. (8) Department of Environmental Assessment, Swedish University of Agricultural Sciences, Box 7050, S-750 07 Uppsala, Sweden. (9) Osborne, B. G.; Fearn, T. Near infrared spectroscopy in food analysis; John Wiley & Sons: New York, 1986. (10) Williams, P. C.; Norris, K. H. Near-infrared technology in the agricultural and food industries; American Association of Cereal Chemists Inc.: St. Paul, MN, 1987; 330 pp. (11) Martens, H.; Naes, T. Multivariate calibration; John Wiley & Sons: New York, 1989. (12) Esbensen, K.; Scho¨nkopf, S.; Midland, T. Multivariate analysis in practice; Camo AS, Wennbergs Trykeri AS: Trondheim, 1994. (13) Ståhle, L.; Wold, S. J. Chemom. 1987, 1,185-196. (14) Dixit, S. S.; Smol, J. P.; Kingston, J. C.; Charles, D. F. Environ. Sci. Technol. 1992, 26, 23-33. (15) Anderson, N. J. Ecol. Modell. 1995, 78, 149-172. (16) Birks, H. J. B.; Line, J. M.; Juggins, S.; Stevenson, A. C.; ter Braak, C. J. F. Philos. Trans. R. Soc., London B 1990, 327, 263-278. (17) Cumming, B. F.; Smol, J. P.; Kingston, J. C.; Charles, D. F.; Birks, H. J. B.; Camburn, K. E.; Dixit, S. S.; Uutala, A. J.; Selle, A. R. Can. J. Fish. Aquat. Sci. 1992, 49, 128-141. (18) Hall, R. I.; Smol, J. P. Freshwater Biol. 1992, 27, 417-434. (19) Anderson, N. J.; Odgaard, B. V. Hydrobiologia 1994, 269/270, 411-422. (20) Anderson, N. J.; Rippey, B. Freshwater Biol. 1994, 32, 625-639. (21) Bennion, H. Hydrobiologia 1994, 276, 391-410. (22) Reavie, E. D.; Hall, R. I.; Smol, J. P. J. Paleolim. 1995, 14, 49-67. (23) Birks, H. J. B. In Ecology and palaeoecology of lake eutrophication; Patrick, S. T., Anderson, N. J., Eds.; DGU Service Report 7; Geological Survey of Denmark: Copenhagen, 1995; p 41.
Received for review December 22, 1995. Revised manuscript received March 22, 1996. Accepted April 4, 1996.X ES950953A X
Abstract published in Advance ACS Abstracts, June 1, 1996.