Environ. Sci. Technol. 2003, 37, 4685-4693
Critical Conceptualism in Environmental Modeling and Prediction G. CHRISTAKOS* Center for the Advanced Study of the Environment, Department of Environmental Sciences and Engineering, School of Public Health, University of North Carolinas Chapel Hill, North Carolina 27599-7431
Many important problems in environmental science and engineering are of a conceptual nature. Research and development, however, often becomes so preoccupied with technical issues, which are themselves fascinating, that it neglects essential methodological elements of conceptual reasoning and theoretical inquiry. This work suggests that valuable insight into environmental modeling can be gained by means of critical conceptualism which focuses on the software of human reason and, in practical terms, leads to a powerful methodological framework of spacetime modeling and prediction. A knowledge synthesis system develops the rational means for the epistemic integration of various physical knowledge bases relevant to the natural system of interest in order to obtain a realistic representation of the system, provide a rigorous assessment of the uncertainty sources, generate meaningful predictions of environmental processes in space-time, and produce science-based decisions. No restriction is imposed on the shape of the distribution model or the form of the predictor (non-Gaussian distributions, multiple-point statistics, and nonlinear models are automatically incorporated). The scientific reasoning structure underlying knowledge synthesis involves teleologic criteria and stochastic logic principles which have important advantages over the reasoning method of conventional space-time techniques. Insight is gained in terms of real world applications, including the following: the study of global ozone patterns in the atmosphere using data sets generated by instruments on board the Nimbus 7 satellite and secondary information in terms of total ozone-tropopause pressure models; the mapping of arsenic concentrations in the Bangladesh drinking water by assimilating hard and soft data from an extensive network of monitoring wells; and the dynamic imaging of probability distributions of pollutants across the Kalamazoo river.
Introduction In environmental research and development we distinguish between (i) investigation techniques (e.g., solving a physical equation, constructing a simulation model, or designing an experimental procedure) and (ii) conceptual reasoning frameworks (e.g., developing a methodology for applying the laws of logic to environmental situations, building hypotheses, or integrating physical knowledge bases). The * Corresponding author phone: (919)966-1767; fax: (919)966-7911; e-mail:
[email protected]. 10.1021/es020932y CCC: $25.00 Published on Web 09/19/2003
2003 American Chemical Society
FIGURE 1. (i) Scientific reasoning process underlying conventional space-time prediction and mapping techniques. (ii) Scientific reasoning structure of critical conceptualism. vast majority of environmental studies belong to group i above, whereas much less attention has been given to group ii. This is unfortunate, because conceptual problems can have, in general, more serious consequences than empirical anomalies. Methodological weaknesses, e.g., constitute acute conceptual problems for environmental theories and techniques exhibiting them. Understanding the conceptual system and reasoning principles is often the primary task, whereas mathematical details are secondary to the conceptual organization and logic of scientific inquiry. Accordingly, for several years unresolved issues in environmental science and technology have pointed toward the need to improve the conceptual framework of physical modeling and prediction (1-4). It is a vital sign of scientific progress that environmental scientists and engineers constantly need to reconstruct the conceptual mechanisms and structures that they use. Figure 1(i) provides a review of the basic reasoning method underlying conventional space-time prediction and mapping techniques (e.g., spatial regression, kriging, basis functions, polynomial interpolation, and neural networks (5, 6)). The basic steps of the method are as follows: One essentially starts with a set of observations relevant to the phenomenon of interest and by means of induction generates a hypothesis (e.g., a theoretical covariance model fitted to the observations). Predictions are derived from the hypothesis by means of deduction (e.g., assuming global validity of the covariance model and minimizing the statistical error, predictions are generated at specified points in the space-time domain of interest). The predicted values at a selected set of points are then compared with experimental results. If prediction and experiment are in good agreement, the procedure is verified (or confirmed, according to a certain group of methodologists of science (7)); if not, one must go back to a previous stage of the method and revise its hypothesis. The conventional reasoning method of Figure 1(i) has been considerably successful, particularly when one deals with the initial description/correlation stage of an environmental study, but it suffers serious limitations when the prediction/explanation stage is the focus of the study (3, 4). VOL. 37, NO. 20, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
4685
This work presents a methodological framework of spacetime prediction and mapping with a wide range of applications in environmental sciences and demonstrates some of its significant advantages over the conventional reasoning method above. The main elements of this frameworks which leads us to study the nature of the frame of thought as applied in environmental problemssare as follows (8, 9): 1. critical conceptualism organizing the concepts and logical processes that characterize the epistemic situation and appear within the scientific context of the discovering, testing, and revising of environmental theories and techniques; 2. environmental knowledge bases established on a closed vs open system basis, thus allowing the consideration and blending of general and specificatory knowledge sources in a rigorous and physically meaningful manner; and 3. a knowledge synthesis procedure to generate maximally informative hypotheses consistent with the core knowledge of the situation and then update them in light of site-specific data by means of integration methods. In environmental sciences there is a great need for methods of epistemic integration, since most natural processes and objects are closely related to each other and there are no isolated systems (4). By providing a rationale for synthesizing and processing various physical knowledge bases, the novel methodological framework eliminates conceptual stumbling blocks to a logically rigorous and physically meaningful space-time prediction theory. Notably, this framework derives previous techniques as its limiting cases under certain restrictions on the modeling assumptions and the knowledge bases usedsa fact that demonstrates its generalization power. Since the main concern of the present work is the software of critical reasoningsi.e. the methodological principles underlying environmental modeling and prediction techniquesscertain technical issues are not considered in detail, although adequate references are provided for the interested reader.
Theory Critical Conceptualism and the Software of Human Reason. Research in environmental science and technology is confronted with two main types of problems: empirical and conceptual. Geospatial mapping of the water table underlying the Cherry Point site (NC) is an empirical problem. The relevant evidence consists of facts about the geohydrology of the region, the topography, etc. Of course, one may not be able to derive a definite solution of this problem, but this is not due to any doubt we may have about the concepts involved. It is rather due to one’s inability to understand the facts available, or because these facts are incomplete and inconclusive. On the other hand, whether hydraulic conductivity can be viewed as a random field is a conceptual problem. To study this problem we need to examine the physical characteristics of hydraulic conductivity across space-time and compare them with the mathematical features of a random field in order to decide whether they are compatible or not, etc. Generally, conceptual problems can have more serious consequences in environmental research than empirical anomalies. Often it is not the physical world that determines the evolution of concepts and ideas but rather concepts and ideas that generate scientific and technological development. Significant conceptual issues arise when seeking ways to make scientific inferences. To solve an environmental problem, it is not sufficient to know the relevant physical laws, but one must also know how to use the laws in an appropriate inferential context linked to the software of human reason rather than its hardware (which, instead, is associated with ontologic issues and empirical investigation). Many studies have been based on the concept that an environmental system was deterministic, putting serious constraints on both the kinds of hypotheses that were open to the scientists and the real-world situations that could 4686
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 37, NO. 20, 2003
be handled by their techniques. Other studies have been based on conceptual structures that neglected the importance of space-time cross-variations and physical interactions, thus generating essentially noninformative predictions. Generally, many conceptual problems of an environmental model or a technique either are related to internal inconsistencies or arise due to conflict with a core theory or doctrine that are well-founded in natural sciences. The conceptual problems of a modeling approach, until resolved, can raise serious doubts about the scientific soundness and problem-solving efficacy of the approach. In view of these considerations, the present work discusses a critical conceptualism methodology that focuses on the application of the laws of stochastic logic to environmental problems. In accordance with critical conceptualism, concepts and knowledge sources are linked by critical reasoning to provide rational support to space-time modeling and prediction (8). The methodology involves the metaphorical use of concepts, principles, and mechanisms underlying natural systems and provides the rational means for extending intuition into realms beyond daily life beliefs and raw experiences (this framework makes a clear distinction between a law of logic and a scientific inference drawn from this law, etc.). Figure 1(ii) outlines the basic steps of the critical conceptualism methodology, which constitutes a considerable improvement over the conventional reasoning of Figure 1(i) (these steps are discussed in more detail in the following sections). We start by distinguishing between general and site-specific physical knowledge. Then, using a teleologic (purpose-oriented) criterion a hypothesis is generated (e.g., space-time probability model) that is consistent with the general knowledge available. What is at issue in this step is the teleology of reason rather than that of Nature (9). Logic integration rules are used at a subsequent step to revise the hypothesis in light of site-specific data, thus leading to an updated hypothesis that is significantly more informative than the corresponding hypothesis of Figure 1(i). Also, a richer variety of knowledge sources is considered in Figure 1(ii) than is processed by the conventional framework. Predictions based on the updated hypothesis are compared with experimental results at selected space-time domains, and if they are in good agreement, the procedure is confirmed; otherwise it is falsified and one must return to a previous step. The possible cause of the falsification may not be attributable to a single element (e.g., theoretical model) but rather to a number of different elements of critical conceptualism, i.e., theoretical models, auxiliary assumptions, experimentation techniques, etc. All these elements are probable sources of falsification and must be investigated in the process of obtaining updated predictions. In the critical conceptualism context of Figure 1(ii), uncertainty is of far greater importance than merely a technical notion reflecting error measurements and observation biases of environmental systems (10). Uncertainty may refer to an epistemic situation (describing one’s state of incomplete knowledge of the environment) or to an ontologic situation representing objective aspects of reality. Due to a number of uncertainty sources related to the real-world situation, the governing physical laws manifest themselves in a complex manner which can be described only in stochastic terms, thus generating a range of possible values (realizations) together with the probability of their occurrence (11). Powerful representations of space-time variations and physical dependencies are derived in terms of spatiotemporal random fields, which can model spatially heterogeneous and temporally nonstationary systems (4). Let the vector p ) (s,t) define a point in the space-time domain (s is a spatial position vector and t denotes time) and associate an environmental variable (e.g., contaminant concentrations or fluid flow patterns) with each p. The random field X(p) offers a
stochastic representation of the distribution of the variable across space-time in terms of its probability density function (pdf), f(χ) (12). Unlike the environmental variable that can be measured using an appropriate instrument, there is no such thing as an instrument that can measure the pdf. Instead, the pdf f(χ) characterizes X(p) at a point p by providing an epistemic assessment of the probability that X(p) ) χ at p on the basis of the physical knowledge available. The pdf peak, say f(χ*), corresponds to the value X(p) ) χ* with the highest probability. If several peaks of the same magnitude exist, {f(χ/i ); i ) 1,2,...}, it means that the corresponding values {X(p) ) χ/i ; i ) 1,2,...} are equally probable (and not that they can occur all at once). Critical conceptualism relies on epistemic objectivity which makes a 2-fold claim: it is the critical reasoning framework one uses to look at the environment that creates the picture of the environment, and one may prefer a specific framework over another on the basis of a set of objective rules (8, 13). Epistemic objectivity can improve matters by developing the logical core of the scientific method underlying space-time modeling and prediction. E.g., predictions using a physical theory-dependent logic framework yield more informative results than classical statistics based on physical theory-free interpretation of facts. General and Specificatory Knowledge Bases. An important component of critical conceptualism is the idea of a knowledge base (KB). A KB denotes a collection of information sources relevant to the problem at hand which are invoked by a reasoning process aimed at solving the problem. Critical conceptualism is an open process that blends several types of environmental KB and multidisciplinary information sources. An efficient classification of KB in environmental science is as follows (12): a. general KB (G), i.e., theoretical models developed for well-defined conceptual environments, including fundamental natural laws, scientific principles, and primitive equations and b. specificatory KB (S), i.e., sitespecific details of the real environment, including hard data, uncertain information sources, probabilistic logic, empirical charts, categorical variables, and fuzzy sets. The union of G and S is the total KB, denoted by K. Certain elements of the G-KB are often associated with science seeking to deepen insight at a fundamental level, in a closed system context. The S -KB usually refers to action science aiming at the predictive precision of an open system. In this system the input parameters are incompletely known, simplifying assumptions of varying validity are used, uncertain influences and interdependencies exist, and auxiliary hypotheses may cancel each other out. Furthermore, when G -KB is considered in an open system context, to have practical meaning certain aspects need to be clarified, like the specific domain of application of the law, the error boundaries within which the law predictions are acceptable, and consistency with the site-specific data (as expressed in S -KB). Thus, a stochastic formulation of the system reflects either the epistemic situation under consideration (e.g., lack of certain knowledge and detailed information) or the nature of the problem we wish to solve (e.g., seeking information about the mortality rate of a population exposed to a pollutant rather than seeking to predict whether an individual who had been exposed will survive). The two KB are closely related to each other (e.g., the scientific paradigm of G -KB affects the gathering of casespecific data in S -KB). In view of the above considerations, in real world studies we often seek the rational means to integrate basic closed system science and site-specific uncertain data into a unique framework for environmental modeling and prediction purposes. This is the task of knowledge synthesis. But before we proceed with knowledge synthesis, let us define the environmental prediction and mapping problem in the light of the critical conceptualism methodology.
The Environmental Prediction and Mapping Problem. The methodology discussed above can have a decisive influence on the formulation of the space-time prediction problem itself as well as on the nature of any possible solution to it. One is often concerned with the prediction of the environmental field X(p) at a network of points pk, given certain core knowledge and a set of application-specific data χdata at points pdata. E.g., a long-term, consistent record of ozone distribution across space is essential to understanding and predicting ozone depletion (14). At the points pk, either we have no observations at all, or the available data are considerably uncertain and cannot be used as reliable predictions of the actual values at these points. In view of the physical K -KB considered above, one seeks to derive the pdf, fK(χk), that characterizes X(p) at every node pk of the mapping grid. The pdf is a key factor in this respect. Indeed, predictions χˆ k of the environmental field at any set of grid nodes are derived from the pdf at the same nodes in terms of a suitable criterion. The choice of the criterion is not unique, but it depends on the goals of the study. In some situations the criterion seeks the most probable prediction, in which case χˆ k ) χk,mode; in some others, the prediction is derived in terms of the mean, χˆ k ) χk,mean; etc. The resulting maps provide detailed representations of the distribution of the environmental field in space-time. Furthermore, various measures of prediction uncertainty can be derived from fK(χk), see ref 12. Knowledge Synthesis. Knowledge synthesis (KS) is that part of the critical conceptualism methodology that refers to the epistemic integration of physical KB to obtain a realistic representation of the phenomenon across space-time, assess important uncertainty sources, evaluate relevant risks, and make science-based decisions. The adoption of a stochastic approach by KS allows the analysis to maintain a more fluid and flexible receptivity to and processing of new, and often uncertain, information. KS is of paramount importance in environmental science and technology, since the solution of a large number of problems is essentially a KS affair. The KS framework of the present work is an attempt to integrate the G-KB with the S -KB in order to generate realistic and informative probability models across space and time. The key issue is how to perform this integration in a physically meaningful, mathematically rigorous, and epistemically sound manner. In response to this challenge, the KS framework assumes three basic epistemic stages of knowledge acquisition and processing, as follows (mathematical details can be found in refs 12 and 15): Stage 1: This structural stage is concerned with the space-time structure of the environmental system. The G-KB is transformed into a set of equations in terms of a structural probability model fG, and a teleologic solution of these equations is sought involving the concept of maximum expected information. Stage 2: Passage from the general to the particular knowledge state includes the consideration of S-KB at the specificatory stage. This stage is concerned with KB representing the site-specific aspects of the natural system and how these KB can be put in an operational form suitable for mathematical analysis and processing. Stage 3: In the integration stage the results of the previous stages are blended by means of a stochastic logic system, thus leading to the final solution in terms of the updated probability model fK (integration or posterior pdf) across space and time. Stages 1-3 involve a holistic aspect, i.e., space-time modeling and prediction of the natural system as a whole cannot be achieved solely from knowledge of its parts sthey emerge from the integrated whole itself. The teleologic solution sought in Stage 1 often involves the Shannon information measure; another solution could use the Fisher measure (15, 16). In Stage 2, site-specific databases of varying levels of uncertainty are considered and transformed into an VOL. 37, NO. 20, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
4687
FIGURE 2. (i) Locations where hard ozone data (triangles) and soft (probability) ozone data (circles) were generated. (ii) Scatter plot of TO3 measurements vs tropopause pressure. A physical equation is fitted to the data from which soft probability models, fS, can be derived (three of these models are shown for illustration). (iii) Reference map of TO3 fluctuations (in DU) obtained from the TOMS instrument on July 6, 1988. A major ozone event occurred in the surface layer over the eastern half of the United States during July 2-11, 1988. On July 6, 1988 the event was at its peak. The white spots indicate areas where no data were available. operational form. In Stage 3, the G-based solutions are revised through application of the logic system to yield updated pdf models that are consistent with the S-KB of the previous stage (the KS framework above is very general allowing the use of different logic systems, including statistical inductive inference and stochastic deductive inference). KS efficiently blends concepts and techniques from different disciplines to produce a new structure showing the 4688
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 37, NO. 20, 2003
influence of the ancestor concepts and techniques without being a mere “cut-and-paste” combination (9). Of course, not all types of KB can be blended (e.g., one may find it meaningless to integrate theories from celestial mechanics and environmental epidemiology). Thus, for the KS framework to make sense the corresponding KB must share certain concepts and referents (actual things in the real world, to which concepts point, etc.).
FIGURE 3. Locations of collected well samples in Bangladesh (categorized as hard and soft data). Teleologic Solution and Logic Integration. An important function of critical conceptualism is to express methodological arguments (Stages 1-3 above) in terms of rigorous equations. The two fundamental KS equations of space-time modeling and prediction are as follows (for mathemetical derivations and other technical details, see ref 12)
∫dχ
mapG(ga)e
µ Tg
) 0 (R ) 0,1,..., N), and T
fK(χk) ) ΘK(eµ g)
(1) (2)
Equations 1 are called the teleologic equations, whereas eq 2 is called the integration equation. Some clarification is necessary regarding the KS stages leading to eqs 1 and 2. As was mentioned before, the functional form of the solution in Stage 1 depends on the information concept assumed. In deriving eq 1 the Shannon information was used in terms of the random field X(p) in space-time. The χmap ) (χdata, χk) is a space-time realization of the geographotemporal map. The G in eqs 1 denotes an operator involving functions ga (R ) 0,1,..., N) properly chosen to express the general KB considered. Some examples are given in the section of “Materials and Methods” below (for a detailed account of G-operators, see ref 12). The ga-functions form a vector g ) {gR; R ) 0,1,...,N}, whereas µ is a vector of coefficients µa associated with g. The µa coefficients, which are functions of the spacetime coordinates, can be computed by solving the teleologic eqs 1. Note that the structural pdf fG of Stage 1 based on G -KB is given by fG(χmap) ) eµTg. The fG implies that the χ-values at different p-points are (nonlocally) connected as induced by the ga-functions. The consideration of S-KB in Stage 2 leads to the formulation of a set of well-defined operators ΞS (16). The S-KB may consist, e.g., of hard environmental data χhard at a set of points and soft (uncertain or secondary) data χsoft at another set of points so that χdata ) (χhard,χsoft)sexamples are given in the section of “Materials and Methods” below. In Stage 3 two groups of logic integration techniques are considered: (i) operational Bayesian conditionalization (bc) techniques and (ii) deductive conditionalization techniques. Group i is based on inductively strong standards, whereas group ii is based on deductively sound principles. The choice of an adequate conditionalization rule is primarily a conceptual modeling affair supported by the physical and logical
FIGURE 4. (i) Kalamazoo river region in Michigan state. (ii) Traditional framework vs (iii) KS-based framework for solving PDE (4). PDE ) partial differential equation, TS ) traditional solution (using standard methods), TP ) teleologic principle, IP ) integration principle, KB ) knowledge base, and PDF ) probability density function. features of the situation. Operational bc is a versatile approach that uses knowledge-based probability operators. In the case of eq 2 the corresponding integration operator is given by -1 ∫ dΞ (χ ΘK ) Θbc D S soft), where A is a normalization K ) A coefficient independent of χk, and the forms of the integrand ΞS and domain D depend on the types of assimilated soft data χsoft (for specific examples of A, ΞS, and D see refs 12 and 16). The deductive random field theory (17), on the other hand, considers various shades of the space-time relationship suggested by the laws of nature and yields non-Bayesian rules that establish causal relevance in the physical sense. When describing objective physical constraints in the environment the causal relationships are ontologic, whereas the probabilistic relationships are epistemic, reflecting what we know about the environment. In many applications, a useful stochastic rule of deduction is the material biconditionalization (mb (15)), in which case the corresponding mb bc integration operator is ΘK ) Θmb K ) 1/(2A-1)(2AΘK ∫ dχdata). The final solution of the KS-based eqs 1 and 2 is a mathematical quantity fK (a pdf model) that offers a complete stochastic characterization of X(p). This solution is considerably richer than a traditional solution obtained by extracting from the algebra a unique X(p)-value at each specified spacetime point. As we will see in the “Materials and Methods” section below, several pdf models fK are produced throughout the space-time domain. In environmental practice, well-known products of the KS framework above are the BME technique (Bayesian VOL. 37, NO. 20, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
4689
FIGURE 5. Map of BME predictions of TO3 (in DU) using both hard (the locations shown in triangles) and soft data. Maximum Entropy; the integration operator is given by Θbc K) and the MbME technique (Material biconditional Maximum Entropy; the integration pdf is Θmb K ) of space-time prediction, among others (12, 15, 16). These two techniques are used in the applications of the following section. Practical successes of the KS-based techniques (e.g., refs 12 and 1422) should be attributed to the considerable improvements of the underlying critical conceptualism methodology compared to the reasoning structure of the conventional prediction techniques. By comparison to the conventional techniques, the KS-based techniques are more (1) open (assimilate several types of knowledge and multidisciplinary information); (2) general (avoid restrictive model assumptions such as linearity, normality, and physical model-independence); (3) informative (generate complete pdf across space-time instead of single predictions); (4) accurate (produce smaller prediction errors across space-time); and (5) nested (conventional techniques are special cases of limited application within a more general framework, an element that demonstrates the considerable generality of critical conceptualism).
Materials and Methods Atmospheric Ozone Data Sets. An interesting practical comparison of the KS-based BME technique of the preceding section vs a commonly used conventional technique is the study of total ozone (TO3) concentrations in the atmosphere (in Dobson units, DU). Below we summarize schematically the basic steps of the BME technique discussed above: eq 1
}
eq 2
G 98 µ 98 f bc K f ΞS
{
χk,mode χk,mean l
(3)
In accordance with eq 3, from the general KB available the G-operator is formulated. This application accounted for 1-point and 2-points (noncentered) moments across spacetime, in which case G ) ga - ga () χi - xi, ) χ2i - x2i , and ) χiχj - xixj for all points pi, pj; the bar denotes stochastic expectation). Then, G is substituted into eq 1 which is solved for the vector µ. From the S-KB available the ΞS-operator is constructed. In this case, the S-KB included data sets generated by measuring instruments (TOMS ) Total Ozone Mapping Spectrometer) on board the Nimbus 7 satellite (14). In addition to hard (exact) ozone measurements, uncertain data and secondary (soft) information in terms of TO3tropopause pressure empirical equations were available across space. Figure 2(i) depicts the points where hard ozone data and soft information were available. For illustration, 4690
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 37, NO. 20, 2003
Figure 2(ii) gives a typical plot of the empirical TO3tropopause pressure law at a representative point together with the associated soft pdf, fS. Given this S-KB, the corresponding operator is such that dΞS ) fSdχsoft. Finally, the µ, G, and ΞS are substituted into eq 2 to obtain the integration pdf model f bc K at all points of interest. From this model predictors can be derived like the BMEmode (χk,mode), the BMEmean (χk,mean), etc. (see section “Results and Discussion” below). In Figure 2(iii) the reference TOMS map is plotted, for future comparison purposes. Data Base of Arsenic in Bangladesh Drinking Water. Arsenic (As) is a known toxic and carcinogenic substance. It is found in groundwater in reduced (AsIII) and oxidized (AsV) forms. This can be a natural result, as is the case in Bangladesh where the As contained in the sediments dissolves in water aquifers. As in the Bangladesh groundwater has been the topic of a considerable number of studies (22, 23). Hence, an interesting application of the KS techniques is the accurate representation of the spatial variation of As concentrations in Bangladesh drinking water. Empirical support is not regarded by KS as a simple two-place relation between theory and evidence but rather as a three-place relation between theory, evidence, and background knowledge. In this study, variogram functions were used to describe the spatial 1 structure of the As distribution, in which case G ) [(χi - χj)2 2 2 - (xi - xj) ] for all pi, pj. Also, hard measurements and soft data (interval functions of various widths) from 3534 wells throughout Bangladesh during the period 1998-1999 were obtained and processed (dΞS ) dχsoft). The locations of the well samples are shown in Figure 3). The original database was provided by the British Geological Survey (for a detailed description, see ref 23). Knowledge Bases of Dynamic Pollutant Distribution. Critical conceptualism views the study of physical laws as a knowledge synthesis process rather than as the direct solution of a system of equations. This is consistent with the open system view of real world situations, as opposed to the closed system assumed in basic law investigations. Consider the stochastic advection-reaction equation representing pollutant distribution in space-time (21)
∂ X(p) + ν ‚∇X(p) ) - κ X(p) ∂t
(4)
where the random field X(p) denotes pollutant concentration, ν is a flow velocity vector, and κ is the reaction rate constant. This partial differential equation (PDE) may be used to represent the distribution of PCBs (polychlorinated biphen-
FIGURE 7. Map of the MbME predictions showing the trends of As concentrations across space (in µg/L). 0.25 day-1, and the initial random concentration X(0,0)∼N(10 ppm, 2 ppm2). In the case of eq 4, G ) χi (∂/∂ti + ν ∂/∂si + K), ) χ2i (∂/∂ti + ν ∂/∂si + 2K), and ) χiχj (∂/∂ti + ν ∂/∂si + K) for all points pi, pj. The soft S-KB included interval and probability data. A methodological comparison of the traditional vs the KS frameworks for studying the PDE (4) is shown in Figure 4 parts (ii) and (iii), respectively. Rather than deriving solutions of eq 4 by conventional PDE means [Figure 4(ii)]swhich can be unrealistic since such solutions do not necessarily satisfy the uncertain site-specific databases available and offer a single pollutant value at each pointsthe KS framework [Figure 4(iii)] makes it possible to study eq 4 by means of rational principles which account for various kinds of core knowledge and uncertain case-specific data and produce informative probability distributions of the pollutant across space-time.
Results and Discussion
FIGURE 6. (i) Frequency distribution of spatiotemporal prediction errors of TO3 mapping obtained by BME (plain line) and spatial kriging (dashed line). (ii) Scattergram of TOMS data fluctuations vs kriging predictions. (iii) Scattergram of TOMS data fluctuations vs BME predictions. yls) concentrations introduced to the Kalamazoo river in Western Michigan, United States [Figure 4(i)] by a paper industry river bank disposal (24). PCBs belong to the most toxic pollutants having long-lasting effects. The X(p) is measured in ppm units, p ) (s, t) (s is the spatial coordinate along the river and t is time), ν ) 10 km × day-1, κ )
Global Mapping of Total Atmospheric Ozone. Figure 5 shows the map obtained by means of the KS-based BME technique. As can be seen by comparing this BME map with the reference TO3 map [Figure 2(iii)], BME offers a realistic representation of the TO3 distribution, leading to accurate TO3 predictions across space. A series of TO3 maps can be generated in time, thus providing an informative representation of the composite space-time TO3 variation (14). To compare the accuracy of BME vs one of the most commonly used conventional interpolation techniques (i.e., geostatistical space-time kriging (6)), we calculated the differences between the predicted TO3 values (using BME as well as kriging) vs the reference TO3 fluctuations at all data points at which ozone measurements were available from TOMS [Figure 2(i)]. The histograms of the prediction errors of BME and kriging are shown in Figure 6(i). Clearly, the BME histogram has a smaller width as well as a sharper peak than kriging around zero prediction error, which implies that the BME map produced more accurate TO3 predictions at a much higher frequency than kriging. In addition, the mean square error (i.e., the average of the squared prediction errors) drops from 110 DU2 (spacetime kriging) down to 26.5 DU2 (BME), thus corresponding to a factor of about four improvement in the precision through the use of the BME. Another measure of error indicating bias is the mean error, i.e., the plain average of prediction errors. The mean error is equal to -1.96 DU for kriging (indicating a slight bias) and ) -0.79 DU for BME (i.e., a difference in accuracy of 60% in favor of BME). The corresponding mean VOL. 37, NO. 20, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
4691
bc FIGURE 8. (i) BME integration (or posterior) densities (fK ) at four solution nodes along the river. Probabilistic (soft) data are assumed at points p1 ) (0.3 km, 0.5 days) and p2 ) (0.3 km, 2.0 days). (ii) The bc mb integration (or posterior) bc (fK ), mb (fK ), and the structural or prior (fG) pdf at a representative space-time point along the river.
absolute errors are 6.44 DU (kriging) and 3.58 DU (BME); i.e., a difference of 44% in favor of BME. In Figure 6 parts (ii) and (iii) we plot the scattergrams of the TOMS data fluctuations vs the corresponding kriging and BME predictions, respectively. Clearly, BME demonstrates a much better correlation with the reference TOMS values than kriging: while kriging yields a correlation coefficient of moderate value (0.66), BME leads to a much higher correlation coefficient of 0.92. Mapping of Arsenic Concentrations in Bangladesh Drinking Water. KS techniques can exploit the KB available to yield a wealth of information in terms of pdf and spacetime maps. Using the MbME technique of KS discussed in the “Theory” section above we can derive a pdf f mb K at each point. The f mb K contains all the possible information we can obtain about the As distribution given the uncertainty characterizing the real-world system. From f mb the K MbMEmean is calculated and used as a prediction of As concentration at each point. For illustration, in Figure 7 we plot the map of As concentrations derived by the MbME technique (maps produced by the BME technique have been presented in ref 22). Figure 7 clearly shows the existing trends in As concentrations throughout Bangladesh. The uncertainty associated with the As map of Figure 7 may be related to the high spatial variability of As and the detection limit of the measurement instrument (prediction error is expected to increase as one moves further away from the sampling locations with hard data, etc.). These kinds of maps are valuable tools for a broad spectrum of end-users, including 4692
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 37, NO. 20, 2003
environmental protection regulators and population health managers (9). Space-Time Mapping of Pollutants Along Kalamazoo River. While with regression methods one is accustomed to making merely statistical inferences, the KS methodology allows one to shift one layer deeper into scientific inference and generate physical law-based predictions. Using BME, a recent study (21) accounted for law (4) as well as for various site-specific interval-type and probability-type data to generate the integration pdf, f bc K , at various times and locations along the river. One of the significant BME features is that in the presence of physical laws it can implicitly incorporate multiple-point spatiotemporal moments without calculating them explicitly. For numerical illustration, a few bc densities, f bc K , at different times and locations along the river are plotted in Figure 8(i)sprobabilistic pollution data at two space-time points were assumed. Furthermore, another study (15) used the mb theory to calculate the probability densities, f mb K , at the same points (MbME). For illustration, the strucbc tural or prior (fG), the mb (f mb K ), and the bc (f K ) densities of the pollutant are plotted in Figure 8(ii) at one representative space-time point. The bc probability is often associated with statistical inductive inference (predictions amplify the data content), whereas the mb probability is associated with stochastic deductive inference (logically valid predictions). In the case of eq 4, the physical law-based structural density has a greater effect on the mb density than on the bc density. The highly “peaked” shape of the latter is due to the strong effect of soft data. These data also have an effect on the mb density, although not to such an extent as for the bc density. A public domain computer library containing the BME techniques used in the applications above is available at the Web page of the Center for the Advanced Study of the Environment (CASE) of the University of North CarolinaChapel Hill (http://www.sph.unc.edu/envr/case/). Also, a version of the BMElib has been published by Springer-Verlag (16).
Acknowledgments The author would like to thank Prof. Mitchell J. Small of the Carnegie Mellon University for his valuable suggestions. The work was supported by grants from the National Institute of Environmental Health Sciences (P42-ES05948 & P30-ES10126), the National Aeronautics & Space Administration (6000RFQ041), and the Army Research Office (DAAG55-98-10289).
Literature Cited (1) Hadlock, C. H. Mathematical Modeling in the Environment; Math. Assoc. of America: Washington, DC, 1998. (2) Houghton, J. Global Warming; Cambridge University Press: Cambridge, U.K., 1997. (3) Sarewitz, D.; Pielke, Jr., R. A.; Byerly, R., Jr. Prediction: Science, Decision Making and the Future of Science; Island Press: Washington, DC, 2000. (4) Christakos, G.; Hristopulos, D. T. Spatiotemporal Environmental Health Modelling; Kluwer Acad. Publ.: Boston, MA, 1998. (5) Cherkassky, V.; Muller, F. Learning from Data; J. Wiley & Sons: New York, NY, 1998. (6) Stein, M. L. Interpolation of Spatial Data; Springer-Verlag: New York, NY, 1999. (7) Oreskes, N.; Shrader-Frechette, K.; Belitz, K. Science 1994, 263, 641. (8) Christakos, G. In Calibration and Reliability in Groundwater Modelling: A Few Steps Closer to Reality; Kovar, K., Hrkal, Z., Eds; IAHS Publ. 277, Oxfordshire, U.K., 2003; pp 277-285. (9) Christakos, G. Multi-Disciplinary Systems in Uncertain Environments; Springer-Verlag: New York, NY, 2004; in press. (10) Golley, F. B. A Primer for Environmental Literacy; Yale University Press: New Haven, CT, 1998. (11) Christakos, G. In Encyclopedia of Environmentrics; El-Shaarawi, A. H., Piegorsch, W. W., Eds.; J. Wiley and Sons, Ltd.: Chichester, U.K., 2001; Vol. 3, pp 1290-1296.
(12) Christakos, G. Modern Spatiotemporal Geostatistics; Oxford University Press: New York, NY, 2000. (13) Matheron, G. Estimating and Choosing; Springer-Verlag: New York, NY, 1989. (14) Christakos, G.; Kolovos, A.; Serre, M. L.; Vukovich, F. IEEE Trans. Geosci. Remote Sensing 2003, in press. (15) Christakos, G. Adv. Water Resour. 2002, 25, 1257. (16) Christakos, G.; Bogaert, P.; Serre, M. L. Temporal GIS; With CdRom. Springer-Verlag: New York, NY, 2002. (17) Christakos, G. Probability Theory and Mathematical Statistics (Teoriya Imovirnostey ta Matematychna Statystyka) 2002, 66, 54. (18) Bogaert, P. Stochastic Environ. Res. Risk Assessment 2002, 16, 425. (19) Serre, M. L.; Christakos, G. Stochastic Environ. Res. Risk Assessment 1999, 13, 1. (20) D’Or, D.; Bogaert, P. Geoderma 2002, 112, 169.
(21) Kolovos, A.; Christakos, G.; Serre, M. L.; Miller, C. T. Water Resour. Res. 2002, 38, 1318. (22) Serre, M. L.; Kolovos, A.; Christakos, G.; Modis, K. Risk Analysis 2003, 23, 515. (23) Chowdhury, U. K.; Biswas, B. K.; Dhar, R. K.; Samanta, G.; Mandal, B. K.; Chowdhury, T. R.; Chakraborti, D.; Kabir, S.; S. Roy, S. In Arsenic Exposure and Health Effects; Chappell, W. R., Abernathy, C. O., Calderon, R. L., Eds.; Elsevier: Amsterdam, 1999; pp 165-182. (24) Crouch, E.; Ames, M.; Green, L. A Quantitative Health Risk Assessment for the Kalamazoo River PCB Site; Cambridge Environmental Inc.: Cambridge, MA, 2001.
Received for review September 10, 2002. Revised manuscript received July 30, 2003. Accepted August 1, 2003. ES020932Y
VOL. 37, NO. 20, 2003 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
4693