Assessing the accuracy of citizen scientist reported ... - ACS Publications

May 1, 2019 - This study demonstrates the importance of evaluating participant background experience in designing citizen science campaigns. View: PDF...
0 downloads 0 Views 3MB Size
Article pubs.acs.org/est

Cite This: Environ. Sci. Technol. 2019, 53, 5633−5640

Assessing the Accuracy of Citizen Scientist Reported Measurements for Agrichemical Contaminants Jonathan M. Ali,† Brandon C. Noble,† Ipsita Nandi,‡ Alan S. Kolok,§ and Shannon L. Bartelt-Hunt*,† †

Department of Civil Engineering, University of Nebraska Lincoln, Omaha, Nebraska 68182, United States Institute of Environment and Sustainable Development, Banaras Hindu University, Varanasi 221005, India § Idaho Water Resources Research Institute, University of Idaho, Moscow, Idaho 83844, United States ‡

Downloaded via KEAN UNIV on July 17, 2019 at 16:03:09 (UTC). See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

S Supporting Information *

ABSTRACT: Citizen science is a research tool capable of addressing major environmental challenges, including contamination of water resources by agrichemicals, such as nutrients and pesticides. The objectives of this study were (1) to identify the proportion of accurate observations by citizen scientists using rapid assessment water quality tools, and (2) to characterize how a user’s prior experience with water quality tools was associated with the accuracy of citizen scientists. To achieve these objectives, we conducted group testing with over 136 citizen scientists and compared their results from water quality testing of water samples to results obtained using laboratory analytical methods. Following brief training, we observed that accuracy of reported results varies based on the user’s experience level where experienced and expert users shared consistent and reliable measurements. Where erroneous measures were reported, citizen scientists tend to overestimate contaminant concentrations when using colorimetric water quality tools. Additionally, we identified differences in accuracy related to the types of water quality assessment tools used by citizen scientists from each experience group. This study demonstrates the importance of evaluating participant background experience in designing citizen science campaigns. water quality across cities,7,9−11 shorelines,12−15 and watersheds as large as the Mississippi River.16 Given the scale of most contemporary water quality problems, especially as they relate to nonpoint source runoff, citizen science is an important, if not essential, tool for large-scale data collection and monitoring efforts. Despite its many advantages, the accuracy of citizen sciencecollected data remains a scientific concern. From the perspective of organizers of citizen science programs, potential sources of data bias include lack of proper study design, nonstandardized protocols, sustained participation, and measurement errors from tools of varying complexity.17−20 One aspect of this concern that has yet to be thoroughly explored is the role of pre-existing STEM experience and its influence on the accuracy of citizen scientists. As reviewed by Lewandowski and Specht,21 several studies have demonstrated that citizen scientist-collected data is comparable to that of professionally collected data. For example, citizen scientists in Toronto, Canada, reported measurements of PO4 and NO3 from colorimetric chemical assays that were comparable to historical surface water data collected from professional scientists.7 To date, there is little evidence regarding limitations in accuracy between subgroups of citizen scientists, such as differences in age or STEM background.

1. INTRODUCTION The detection and mitigation of agrichemical runoff into surface and groundwater resources is an ongoing global challenge. Nutrients, pesticides, and other agrichemicals enter surface water annually after land application to agricultural soils, contributing to degradation of downstream freshwater resources and coastal environments.1,2 Recent analysis of the U.S. EPA’s Safe Drinking Water Information System database found that although surface water trends for nitrate violations have recently declined, there has been an increasing trend in groundwater contamination from sources including agricultural runoff, confined animal feeding operations and improperly functioning septic systems.3 Agrichemical contamination of surface water and groundwater present environmental and human health risks, thereby necessitating monitoring efforts capable of tracking this pervasive and widespread problem. Citizen science, also referred to as crowdsourced science, is a contemporary research method recently highlighted as an important research technique by the National Academies.4 Unlike traditional sampling strategies, citizen science can address several logistical and technical impediments to largescale water quality sampling such as prohibitive per sample costs, small numbers of sample, and the logistical issues of collecting samples over large temporal and spatial scales.5,6 Simple and affordable test kits provide the potential for increasing sample size and participation by a broad audience of participants.7,8 Large sampling geographies that are both time and cost prohibitive for a single laboratory to sample are not limiting to citizen science campaigns that have monitored © 2019 American Chemical Society

Received: Revised: Accepted: Published: 5633

November 28, 2018 April 24, 2019 April 30, 2019 May 1, 2019 DOI: 10.1021/acs.est.8b06707 Environ. Sci. Technol. 2019, 53, 5633−5640

Article

Environmental Science & Technology

Figure 1. Sampling design for the assessment of citizen science accuracy for three agrichemical contaminants. Citizen scientists were provided laboratory-prepped solutions with various combinations of the three agrichemicals at different concentrations. Concentrations for laboratoryprepared and field-collected water samples are summarized in Table 1.

conducted since 2011 and were not recruited as a convenience sample. Specific recruitment strategies included hosting booths at professional conferences for voluntary participation, holding voluntary drop-in events at the University of NebraskaOmaha’s Community Engagement Center, recruitment from among prior citizen science volunteers, and advertising in Omaha area schools to work with science educators and students. Approximately 40 of the 86 STEM college students participated in the testing as part of a laboratory course that they were taking for credit. These volunteers were grouped into three separate experience classes: expert, experienced, and inexperienced (Supporting Information (SI) Table S1). Inexperienced testers were defined as testers with no significant laboratory experience or prior training with water quality testing which included middle and high school students, as well as college students without exposure to college level STEM coursework. Experienced testers were those with some exposure to laboratory testing, through a college level course or other means, whereas expert testers were those that had extensive prior experience with water quality tests or other related laboratory testing methods. The citizen scientists included middle, high school, and college students as well as individuals recruited from two professional organizations and a local company’s sustainability team. Figure 1 outlines the study design used for the collection and analysis of the citizen scientist measurements that were collected from a series of educational workshops from October 12 through November 17, 2016. Initially, measurements were

If citizen science is to serve as a complementary data collection method to existing and emerging water quality monitoring programs for nutrient and agricultural contaminants, the accuracy of citizen science across different experience levels must be assessed. Thus, the objectives of this study were to (1) identify the proportion of accurate observations by citizen scientists using rapid assessment water quality tools, and (2) characterize how expertise was associated with the proportion of accurate observations made by citizen scientists using these tools. To achieve these objectives, we conducted test groups with over 136 citizen scientists and compared their results from water quality testing of spiked laboratory solutions and field collected samples to results obtained using laboratory analytical methods. The contaminants evaluated in this study were two nutrients, nitrate and phosphate, and a single herbicide, atrazine, which are common nonpoint source pollutants detected across the United States. Given their nearly ubiquitous geographic distribution and availability of rapid assessment test strips, these chemicals are well-suited for monitoring by citizen science programs.

2. MATERIALS AND METHODS 2.1. Citizen Scientist Recruitment and Study Design. To evaluate the accuracy of the citizen scientist-collected data, volunteers were recruited from various student, community and professional groups in Omaha, NE. In this study, the participants were recruited using strategies previously employed by the authors in prior citizen science work 5634

DOI: 10.1021/acs.est.8b06707 Environ. Sci. Technol. 2019, 53, 5633−5640

Article

Environmental Science & Technology

each of the three compounds. One liter samples were collected for analytical testing of nitrate, phosphate, and atrazine concentrations in the diluted solutions by a contract laboratory (Midwest Laboratories, Omaha, NE). 2.5. Field Water Testing. Field samples were collected from Elmwood Creek, a stream that flows through central Omaha and is surrounded by residential housing, a golf course, recreational park, and small shopping complex. Similar to the laboratory prepared solutions, samples of the Elmwood Creek water were sent to Midwest Laboratories for measurement of phosphate, nitrate, nitrite, and atrazine concentrations (SI Table S2). 2.6. Statistical Analysis. All data were analyzed using JMP 11 software (SAS, Cary, NC). All reported measurements collected from the expert, experienced, and inexperienced participants were compared with the actual concentrations of nitrate, phosphate, and atrazine determined through the analyses of the water samples by Midwest Laboratories. From this, the reported responses were scored as either accurate, underestimated, or overestimated. Accurate responses were either (1) those where the reported response matched the actual concentration (e.g., participant reported 5 ppm of phosphate when the actual concentration was 5.41 ppm), or (2) those where the reported response flanked an actual concentration that was not discretely recognized by the specified test (e.g., participant reported 5 or 10 ppm of nitrate when the actual concentration was 8.30 ppm). Underestimated responses were those that were below actual concentration or below the available lower boundary for the appropriate flanking response option on the test strip. Overestimated responses were those that were above actual concentration or above the available upper boundary for the appropriate flanking response option on the test strip. For the measurements of water samples from Elmwood Creek, Chi-square tests were applied to detect differences in the proportion of each of these response types between user experience level for each of the test strips. For all statistical tests, statistical significance was assumed at p < 0.05. Relative risk ratios (RR) for accurate, underestimated, and overestimated responses between the different experience levels were calculated from the previously described proportions, combining the results obtained from the laboratory-prepared and field-collected water samples. The RR were calculated by the following equation where the outcome of interest is the observation (e.g., accurate, over estimation, underestimation), the comparison group is the numerator and the reference group is the denominator.

collected from laboratory prepared solutions provided to inexperienced and experienced participants to screen for issues associated with the training and instructional pamphlets provided along with the test strips. Following adjustment to the instructions, described below, a larger pool of participants including expert (n = 37), experienced (n = 43), and inexperienced (n = 55) citizen scientists were provided fieldcollected streamwater samples for assessment of accuracy (SI Table S1). 2.2. Nitrate, Phosphate, and Atrazine Test Strips. Test group participants evaluated three water quality parameters: nitrate, phosphate (measured in parts per million (ppm)), and atrazine as a presence/absence test. Measurement of nitrate and phosphate were conducted utilizing colorimetric test strips manufactured by Hach, a similar test strip platform used for a previously published citizen-science survey of nutrients.28 The color scale for the test strip of each nutrient indicates discrete concentrations of contaminant after a 30−60 s development period. Measurement of atrazine was done using atrazine test strips (Abraxis) which detects the presence of atrazine at or above 3 ppb following a 10 min incubation period. Each citizen scientist participant was provided with two test strips for each chemical parameter where one test strip was used to test a laboratory-prepared water sample and the other test strip was used to test a field-collected (natural) water sample. Citizen scientists were provided with two samples in identical containers and were not informed at the time of testing as to the origin of the samples. 2.3. Instructional Pamphlets. At the beginning of each test group, a short verbal introduction and demonstration of the test strips was given along with instructions. The citizen scientists were each issued a set of written instructions for the tests that they would be performing on the spiked samples (see SI Figure 1). The instructions had data entry locations for each of the citizen scientists to record their test results. As the original data collection was done through the interpretation of handwritten data entries, some of the data were either disregarded or unusable due to illegible notation. Of the original data set of 136 testers, the percentage of unusable data was 1.5%, 15%, and 11.8% for nitrate, phosphate, and atrazine tests, respectively, as these results were either illegible or the citizens scientists were observed copying another’s response. Follow initial testing with laboratory prepared solutions, it was determined that a manufacturer discrepancy in the position of the color-changing pad on the atrazine test strip lead to difficulties in use interpretation of the atrazine test strip. This was addressed by a modification of the instructional pamphlets (SI Figure 2) where the interpretive diagram of the blue indicator bars for positive and negative readings on the atrazine test strip were adjusted to reflect the discrepancy of specific manufacturer lots. These modified instructions were utilized for the remaining tests on field-collected water samples. 2.4. Laboratory Solution Testing. Laboratory prepared solutions for the initial tests were prepared so that participants received differing combinations of low, medium, and high concentrations (Figure 1) of nitrate, phosphate, and atrazine. These solutions were prepared by mixing concentrated stock solutions of each compound with filtered tap water to produce three solution combinations for the testing groups on October 12th and October 21st, 2016. All stock solutions were stored at or below 4 °C. Stock solutions were diluted using Omaha tap water to achieve combinations of varying concentrations for

( relative risk ratio = (

) )

outcome of interest in comparison group all observation outcomes in comparison group outcome of interest in reference group all observation outcomes in reference group

(1)

A RR equal to 1 indicates an identical probability of that response type between two groups. Odds ratios and corresponding confidence intervals were not calculated due the lack of all response types (e.g., accurate, underestimated, overestimated) across all experience levels and test strip types. For the responses generated from the nitrate test strips, we further reduced the classification of participant responses to either correct (i.e., previously deemed accurate) or incorrect (i.e., previously deemed under- or overestimated). These observations were used to generate receiver−operator characteristic (ROC) curves for the nitrate results from the 5635

DOI: 10.1021/acs.est.8b06707 Environ. Sci. Technol. 2019, 53, 5633−5640

Article

Environmental Science & Technology expert, experienced, and inexperienced participants. A ROC curve describes the diagnostic capability of a test based on binary classifier system that assess the probability of true positive versus false positive responses, where an area under the curve (AUC) closer to a value of 1.00 indicates greater diagnostic accuracy of a testing method in question and a value similar to 0.50 indicates poor diagnostic accuracy (Pagano and Gauvreau 2000). The Chi-square distribution test was used to determine whether the AUC of each experience group statistically differ from an AUC of 0.50. Due to a lack of observations across broader concentration ranges for the phosphate and atrazine test strips, their results were excluded from this analysis.

proportions of accurate responses of 87.8% and 83.6%, respectively. There was no underestimation (reported false negatives) of either phosphates or atrazine in the laboratory prepared solutions; however, the mixed experienced group did overestimate (reported false positives) the concentrations of phosphate and atrazine at a proportion of 12.2% and 16.4% of the time, respectively. The nitrate tests were more prone to error with a lower proportion of accurate responses (60.3%) than the phosphate and atrazine tests, and a higher incidence of underestimating (16.2%) and overestimating (23.5%) the actual concentration of nitrate. This was intriguing as the phosphate and nitrate tests are both colorimetric response tests along a concentration gradient. The concentration of the nutrients may affect under- and overestimation of concentrations that citizen scientists report; however, the number of nutrient concentrations tested (Table 1) limits the interpretation of such an effect based the present study. The atrazine test differs from the nitrate and phosphate tests as it generates a qualitative positive or negative response. The accuracy for the atrazine tests (83.6%) in the initial testing was less than that observed in previous citizen science surveys using the same test strip16 and likely due to a manufacturing discrepancy of the atrazine test strips. The response of these test strips differed from those of test strips possessing other manufacturer lot numbers16,22−24 where the alignment and color intensity of the responses window were misaligned from the manufacturer’s instructions, but based on our laboratory confirmation, this did not affect the technical accuracy of the test strip. We modified the instructions for the atrazine test strips in our testing procedure (SI Figures S1 and S2) to account for this discrepancy and for difficulties in interpretation of the test strip as communicated by participants. Using the modified testing instructions (SI Figure S2), the accuracies of inexperienced, experienced, and expert citizen scientists were compared following testing for the same three contaminants in field-collected water samples (Figure 3; Table 1). As described in Materials and Methods Sections 2.2 and 2.6, each citizen scientist reported results from a pair of test strips for each chemical with one result for the laboratoryprepped samples and the other for the natural water sample. We observed differences in the proportion of accurate responses based on the user’s experience level and, again, the type of test strip used. Specifically, there were stark differences in accuracy of all three user groups between the colorimetric test strips for nutrients and the immunochromatographic test strip for the herbicide atrazine. Despite modifications to the instructions to control for instruction-related errors in interpretation, we observed reduced accuracy of the atrazine test strip results relative to the preliminary assessment. Significant differences were found between the atrazine test results based on user experience (Chi-square = 29.386, df = 2, p < 0.0001), but the atrazine test strips were the least accurate (≥14%) of any water quality parameter. Experienced users were the least accurate in interpreting the atrazine test strips (14%), whereas inexperienced and expert users were marginally more accurate reporting a correct measurement in 50% and 74.3% of their samples, respectively. This variation in accuracy is striking as the atrazine test strip is an immunochromatographic test method is to the same as at-home medical diagnostic test strips designed for use with minimal instruction and are expected to be highly accurate. For example, at-home HIV tests are highly accurate (≥93%) and designed for personal use across a broad

3. RESULTS AND DISCUSSION The objectives of this study were to (1) identify the proportion of accurate observations by citizen scientists using rapid assessment water quality tools, and (2) characterize how expertise was associated with the proportion of accurate observations made by citizen scientists. To address these objectives, citizen scientists from differing experience levels were assessed for the accuracy of their reported measurements of three common agricultural contaminants in laboratoryprepared and field-collected water samples. Overall, we found that accuracy varies based on user experience level, and that in the case of erroneous measures, citizen scientists tend to overestimate contaminant concentrations when using colorimetric tools. It should be noted that in this study that the citizen scientists were not age-matched across experience levels and the inexperienced citizen scientists, several of which were middle and high school students, were generally younger than the expert citizen scientists who were college students with laboratory experience and working professionals. There was significant overlap in the recruitment groups for inexperienced and experienced groups. Future work should consider both the age and experience level of the citizen scientists when evaluating accuracy. Preliminary assessment of accuracy using laboratory-prepped solution revealed differences in accuracy by test type and potential sources of error from instructions provided to citizen scientists. There was significant variation in the proportion of accurate responses reported by citizen scientists that ranged from as low as 14% while using the atrazine test strips, to as high as 100% while using the nitrate test strips (Figures 2 and 3). Phosphate and atrazine test strips provided similar

Figure 2. Proportions of accurate and inaccurate responses following preliminary assessment of test strips using the original instructions (SI Figure S1). 5636

DOI: 10.1021/acs.est.8b06707 Environ. Sci. Technol. 2019, 53, 5633−5640

Article

Environmental Science & Technology

Figure 3. Proportions of accurate and inaccurate responses based on user experience levels (SI Figure S2). Differences were detected between experience levels in the proportion of accurate and inaccurate responses across all contaminant test strips (Chi-square test; p < 0.05).

Table 1. Actual Concentrations of Nitrate, Phosphate, And Atrazine Found in Laboratory-Prepared and Field-Collected Water Samples Provided to Citizen Scientists for Testinga citizen scientist experience level

date prepared or collected

nitrate (ppm)

phosphate (ppm)

atrazine (ppb)

Laboratory Prepared Solutions inexperienced

10/12/2016

experienced

10/21/2016

43.70 3.86 8.30 44.4 4.69 15.3

0.27 5.41 13.60 0.09 0.08 0.10