FEATURE
Integrating Environmental Data To Meet Multimedia Challenges Agencies are pulling together their disparate data resources for better decision making. ELAINE
L. A P P L E T O N
E
(EPICS), which revolves around a single Oracle database containing facilities information, has been in use for several years. More recently, the Texas Natural Resources Conservation Commission has also embarked on an information integration project. So too has Oregon, which built a toxic materials tracking system that Washington State subsequently purchased and modified to meet its own needs.
nvironmental protection agencies have recently begun to voice a "holistic" philosophy of environmental protection. Instead of addressing one medium or one issue at a time, organizations are developing integrated, multimedia, place-based approaches that involve local decision makers. But agencies have been stymied in their efforts to implement these strategies by a deceptively simple problem: Their databases of program-specific data are frequently incompatible with each other. Even such mundane data elements as the address of a site or facility have not traditionally conformed to a standard format so that, in many cases, regulatory databases don't agree on exactly "where on the planet" regulated facilities are located. Resolving this incompatibility is a multimillion dollar problem that agencies are now trying to solve by cleaning and "integrating" their environmental databases. For example, EPA is integrating several nationwide facility databases to allow multimedia assessment and compliance. This fall the Office of Water will release a prototype of a reengineered water quality database called STORET as a client-server system. The Washington State Department of Ecology has embarked on a multiyear project to link all of its environmental data resources. The new system is intended to help internal users, industry, and others easily and quickly access the information they need to make day-to-day decisions. Other state agencies also have been attempting to better coordinate their data. In 1989, for example, the Massachusetts Executive Office of Environmental Affairs (EOEA) began creating an integrated environmental management system that connected various air, land, and water quality databases. EOEA's goal, like that of Washington State, was to create a system that would support comprehensive environmental protection and cross-media monitoring and compliance. The Environmental Protection Integrated Computer System
Need for cost-effective analysis The Washington Department of Ecology in Olympia is in the second year of a six- to seven-year plan to create an integrated information system that will include new databases and links to existing databases. Ultimately, the new system, which will cost at least $7 million, should help internal users and the public access the information they need to make dayto-day decisions, says information integration project manager Lynn Singleton. The project, Singleton says, will "position the agency to meet the changing environmental management needs in the state" by creating the infrastructure to support multimedia, watershed-based, local decision making. "We couldn't afford not to do it," he says. Singleton illustrates the need for the project by citing the department's painful experience performing a statewide environmental justice evaluation in 1994. To compare U.S. census information with its own environmental data, "we had seven different data sets we wanted to collectively evaluate," recalls Singleton. "It took us nine months just to go through each of those databases and determine if the locational information was correct—to determine if Acme Industries on First Street South was the same as Acme Corporation on First St." Cleaning the data cost $65,000, and the analysis, another $20,000— considerably more than the original $29,000 budget approved by the state legislature. Once cleaned, however, the databases continue
3 4 4 A • VOL. 30, NO. 8, 1996 / ENVIRONMENTAL SCIENCE & TECHNOLOGY / NEWS
0013-936X/96/0930-344AS12.00/0 © 1996 American Chemical Society
Data scattered throughout 28 systems within the Washington State Department of Ecology are being combined into a central "Facility Site" database. The system will include information on 1800 hazardous waste and 1200 wastewater discharge permittees. It will eventually be linked to all of the databases that use facility/site information in the department.
to make useful analysis possible. A subsequent environmental justice evaluation for a county cost the department only $1000. "The moral is once you have your data in a consistent format and quality, you can go through and ask any number of questions and actually turn your data into information," says Singleton. Ultimately the agency will organize its data into 24 "business areas," for example, facilities and sites, environmental data, permitting programs, technical assistance and outreach, and budgeting and billing. Users, such as regulators or program officers, will be able to access central databases by using graphical interfaces on their personal computers, regardless of the hardware and software on which the databases are stored or where they are located. So far, the agency has created a central facilities database of 1800 hazardous waste sites and has incorporated such basic information as longitude, latitude, and regulatory contacts. It is now in the process of adding facility information for 1200 wastewater discharge permittees into the database.
Previously that information was scattered throughout 28 disparate systems. Eventually, the facilities-site database will have links to all of the other databases. To avoid rewriting software codes for each system, Singleton and his staff are using Composer, a software generation tool from Texas Instruments, to link existing databases to new ones. Once completed, users who are "inside" the facilities-site database will be able to click on appropriate icons to access linked databases. For instance, an "air" icon will immediately access die air quality database. Users also can use geographic information systems (GISs) to answer environmental "what-if" questions by viewing data spatially. Engineering a national facilities database Nationally, EPA is tackling its own information integration problems. It is in the beginning stages of a "key-identifier" project to standardize and consolidate basic identification information on up to 600,000 regulated facilities, according to Linda Travers, director of the Information Management Division at VOL. 30, NO. 8, 1996 / ENVIRONMENTAL SCIENCE & TECHNOLOGY / NEWS • 3 4 5 A
EPA's Office of Pollution Prevention and Toxics. Through the key-identifier consolidation initiative, "The agency will be able to provide access to the data by chemical and by facility," says Travers. "We'll have better public access, and we'll empower communities to use the data for whatever reasons they may have. We'll be able to do multimedia assessment and compliance." The agency plans to issue a notice requesting public comment this summer. Travers emphasizes that EPA needs a better way to capture and store consistent and comprehensive facilities data. "We consistently hear from users inside and outside EPA that they do not have enough confidence in the various databases to feel they have all the data from a facility." User confidence in the quality and consistency of data is also an issue in EPA's current reengineering of its massive STORET water quality database. The system, originally developed in 1964, today contains more than 100 million pieces of water quality monitoring data taken from more than one million monitoring locations. These results, which exist in a wide variety of formats and —Lynn Singleton, which were generated thousands of samWashington State by pling methods, are Department of Ecology owned by 40 states, 14 federal and interstate organizations, and parts of Canada.
"To be relevant in the information age, you have to provide people with the information they expect you to have."
Raw data from STORET is often used as input for environmental impact and assessment statements. According to Lee Manning, lead technical designer for the revamped STORET, "The widespread sharing of data has led to a lack of confidence in the existing system because the metadata [information about the origin of data] that would allow you to assess the quality of the data are not an integral part of the system. Unknown quality of data is the most frequently cited reason for not using data in computerized databases." The result of this lack of confidence is that users either redo studies needlessly or make decisions without benefit of data, says Robert King, an information systems chief in the Office of Water. King and Manning are in charge of the $2-3 million project to turn the outmoded mainframe database into a client-server system that can be ac-
3 4 6 A • VOL. 30, NO. 8, 1996 / ENVIRONMENTAL SCIENCE & TECHNOLOGY / NEWS
cessed by personal computer users across a network and, eventually, by authorized Internet users. (Data security is an issue yet to be worked out.) They plan to have a prototype completed by September and a production system available by July 1997. The new STORET system will include significant amounts of metadata, such as the purposes for which data were collected, people and organizations that participated in data collection projects, and quality control standards applied during collection. "These kinds of things reflect on the quality of information you collect and improve its utility to a wide audience," says Manning. Technical and "cultural" barriers One of the barriers data integrators have encountered is a reluctance by some data owners and regulators to give up their data to widespread use. "We're talking about a fundamental change in the way we do business in our agency," says Washington's Singleton. People are afraid that "if I give you my data, you'll misuse it," he says. To quell these fears, Singleton and his staff have involved business users throughout the agency in the project's design. The technical challenges are significant, too. One of the most difficult is the attempt to link databases that live on different software and hardware platforms. "We're developing facilities-site in Sybase [a relational database]. The air system is in Informix and the water system is in Oracle," says Singleton. "So we're looking at middleware products that allow you to go back and forth and update information and allow the user to access all three of those systems in a seamless way. If you haven't had standards, or you have various ages of systems out mere, you're going to have to deal with issues about multiple platforms." It's too early to tell whether the efforts to link the agency's various databases will pay off. But users, especially those who deal with the public, are looking forward to its full implementation. "Down the road, we'll see the benefits," says Roque Nalley, an environmental specialist in EPA's Spokane office. Nalley manages the state's database of confirmed and suspected contaminated sites. "This is something the public has been asking us about for a long time. They've been frustrated about having to go to five or six different programs regarding one site." "To be relevant in the information age," Singleton believes, "you have to provide people with the information mey expect you to have. People who have environmental questions expect us to be able to provide answers. After all, that's our job." Elaine L. Appleton is a Newburyport, Mass., freelance writer specializing in business, technology, and the environment. Her work has appeared in the Boston Globe and the New York Times.