shutterstock jupiterimages
Long-Term Worldwide QA/QC of Dioxins and Dioxin-like PCBs in Environmental Samples Bert van Bavel Örebro University (Sweden) Esteban Abad IIQAB-CSIC (Spain)
D
ioxins are unwanted byproducts of several industrial processes, including the production of herbicides, paper bleaching, chlorine production, incineration of hazardous and municipal waste, and the metallurgical industry. Other secondary anthropogenic sources include sewage sludge from wastewater and contaminated industrial sites (1, 2). In addition to dioxins resulting from human activity, dioxins can be found in very old layers of clay, the result of natural processes. However, the amount of naturally formed dioxins is a very small percentage of the total compared with the amount from anthropogenic sources (3, 4). A lively discussion on the occurrence, distribution, and formation of dioxins and dioxin-like compounds has been ongoing in the literature since the early 1970s. Dioxins are now routinely analyzed in a large number of different matrices and are still in the news because of accidental food contamination, unacceptable levels in fish, and a poisoning attempt (5–12). Dioxins have never been deliberately synthesized, other than in extremely small amounts for use as analytical standards or in toxicological tests. The levels of dioxins in environmental samples are normally in the parts-per-trillion range, and thus some analytical
© 2008 American Chemical Society
A difficult and complicated measurement yields to better methodology, instrumentation, and data handling. skill is required to detect them among the many potential interfering compounds present at much higher concentrations. A large number of national dioxin inventories have been done in Europe, the U.S., Canada, and Japan (13). Much fewer data are available from developing countries in Africa, South and Central America, and Asia. Several local initiatives have started within the framework of the Stockholm Convention on Persistent Organic Pollutants (POPs), and it is important that the quality of the data from those new initiatives be controlled (14). This is especially critical when the results of national inventories are compared with one another. The analytical variance should be as small as possible and quantifiable. Only in this way can solid decision making take into account analytical errors in comparisons among databases of different nations and regions. An approach that has worked very well at improving analytical quality and quantifying uncertainty is participation in interlaboratory intercomparison, also known as intercalibration studies. The results from >15 years of quality assurance/ quality control (QA/QC) of dioxins show that it is possible to increase the worldwide number of laboratories doing this analysis and to increase the quality of the analysis. This approach is also useful for PCBs, brominated flame retardants,
J u n e 1 , 2 0 0 8 / A n a ly t i c a l C h e m i s t r y
3957
pesticides, and heavy metals. Dioxin analysis is one of the most complicated tests to perform and is the perfect example of what can be achieved with strict QA/QC and long-term interlaboratory studies. Knowledge acquired from dioxin analysis can be readily applied to other more easily analyzed target compounds.
The toxic equivalency factor concept
Although often reported as only one value, the dioxins and the structurally similar furans consist of 75 dioxin (PCDDs) and 135 furan congeners (PCDFs). The structures of both classes are characterized by a planar configuration of two aromatic rings joined together by C–O and C–C bonds and different chlorine substitutions on the aromatic rings (Figure 1). All 210 congeners are different compounds and have distinct chemical, physical, and toxicological properties. PCDD/Fs are stable in the environment, are persistent, and may accumulate in biota. They meet all the POP criteria and are listed in the Stockholm POPs list (15). Not all dioxins are toxic at extremely low concentrations; only those with chlorine substitution in the 2,3,7,8 positions produce adverse effects. This reduces the number of toxicologically interesting PCDD/Fs to 7 dioxins and 10 furans; the dioxin 2,3,7,8-TCDD was classified as a human carcinogen in 1997 (16). Various matrices contain different mixtures of dioxin or furan congeners. For example, fly ash samples may contain almost all the congeners, which is also often the case for soil and sediment samples. Other samples contain only a limited number of congeners; for example, sewage sludge is dominated by the higher chloriTable 1. PCDD/F TEFs nated dioxins known as OCDD. Samples taken from organisms according to WHO that are relatively high in the (1998). food chain, including humans, Compound TEF almost always contain only congeners with chlorine substitu2,3,7,8-TeCDD 1 tions in the 2,3,7,8 positions. 1,2,3,7,8-PeCDD 1 Dioxins and furans with 1,2,3,4,7,8-HxCDD 0.1 chlorine substitutions in the 1,2,3,6,7,8-HxCDD 0.1 2,3,7,8 positions are potent in1,2,3,7,8,9-HxCDD 0.1 ducers of the aryl hydrocarbon receptor (AhR). After binding 1,2,3,4,6,7,8-HpCDD 0.01 to the AhR, the ligand–receptor OCDD 0.0001 complex is translocated in the 2,3,7,8-TeCDF 0.1 cell and induces transcription of 1,2,3,7,8-PeCDF 0.05 the CYP1A gene. This mediates a cascade of biochemical and 2,3,4,7,8-PeCDF 0.5 toxic events. Non-2,3,7,8 com1,2,3,4,7,8-HxCDF 0.1 pounds do not show activity 1,2,3,6,7,8-HxCDF 0.1 to the same extent. To provide 1,2,3,7,8,9-HxCDF 0.1 a frame of reference, the toxic2,3,4,6,7,8-HxCDF 0.1 ity of the 2,3,7,8 compounds is often related to that of the most 1,2,3,4,6,7,8-HpCDF 0.01 toxic dioxin, 2,3,7,8-TeCDD. 1,2,3,4,7,8,9-HpCDF 0.01 This compound is assigned a OCDF 0.0001 toxic equivalency factor (TEF) 3958
A n a l y t ica l C h e m is t r y / J u n e 1 , 2 0 0 8
of 1; the TEFs of the other Table 2. PCB TEFs 17 PCDD/Fs range from 1 to according to WHO 0.0001 (Table 1). This assign- (1998). ment assumes the same toxicity Compound TEF mechanism and additivity of the toxicity (17). Non-ortho Also, certain PCBs can adopt PCB 77 0.0001 a structure with the two biphen PCB 126 0.1 yl rings located in the same plane, which gives them a toxicity simiPCB 169 0.01 lar to that of dioxins. This conPCB 81 0.0001 figuration is energetically favorMono-ortho able for PCBs with no chlorine substitution in the ortho posiPCB 105 0.0001 tions but also can be achieved by PCB 114 0.0005 PCBs with only one chlorine in PCB 118 0.0001 the ortho position. This has been recognized in three of the plaPCB 123 0.0001 nar PCBs (77, 126, and 169) and PCB 156 0.0005 for the mono-ortho substituted PCB 157 0.0005 PCBs (105, 114, 118, 123, 156, 157, 167, and planar 81; Table 2). PCB 167 0.00001 When the levels of dioxins and PCB 189 0.0001 furans are reported, the actual congener concentrations in an environmental sample are multiplied with the respective TEF, and the total toxic equivalent (TEQ) is calculated by N
TEQ = Σ C i × TEFi i=1
Although the TEF concept was originally developed to assess the toxicity of abiotic samples to humans, it has proven very useful in assessing the toxicity of a variety of environmental samples, including biological ones. The TEQ is often used for legislation and risk assessment and management (18, 19). Because only 17 dioxins and 12 dioxin-like PCBs must be analyzed out of 210 and 209 compounds, respectively, a selective, specific, and robust methodology with sufficient sensitivity is required to achieve analysis at parts-per-trillion levels. Coelution or interferences can result in large errors and wrong TEQs.
Chemical analysis
Congener-specific analysis of dioxins is predominantly performed by high-resolution (HR) GC coupled to HRMS after elaborate extraction and cleanup procedures. Soxhlet extraction with organic solvents is the most commonly used technique for solid samples, although accelerated solvent extraction and supercritical extraction are also used. For the analysis of ash samples, there is some debate whether sample pretreatment with acid is necessary. For sediments, the addition of copper to reduce sulfuric compounds might be necessary during extraction or cleanup. Further cleanup is done by open-column chromatography with acid–base-modified silica and alumina oxide or Florisil (20). The final separation is done on carbon, where
the planar compounds are separated for any losses sustained during exO from the nonplanar ones. The sample CI CI CI CI traction and cleanup. cleanup after extraction is similar for Standard methods are available O O many matrices, including biological from the U.S. Environmental Pro75 Congeners 135 Congeners samples. tection Agency (EPA), the EuroO Although cleanup and extraction pean Union, and Japan, and these CI CI procedures have improved and autoare used with or without modificaCI CI mated systems have become available, tions by many of the laboratories O many laboratories still use procedures doing dioxin analysis (27–30). Im2,3,7,8-TCDD that can take more than a day to preportant QA/QC criteria include pare 5–10 samples for GC/MS analy- FIGURE 1. The chemical structure of chlorinated dioxthe recovery of the internal stansis (21, 22). To achieve separation ins and furans. dards, the relative retention time before MS detection, 1–2 µL of the of the target compound compared concentrated final extract (10–50 µL) is injected on a HRGC with the internal standard, and the chlorine isotope ratio fallcolumn of at least 25 m. Nonpolar columns with 5% phenyl/ ing within 15% of the theoretical chlorine isotope ratio when 95% methylpolysiloxane phase materials and internal diameters the two most abundant ions of the chlorine cluster are meaof 0.18–0.32 mm, depending on the length and film thickness sured. Additional QA/QC includes a blank sample (with each of the columns, are routinely used. Custom-made columns batch of samples) with values that are 10,000 is used, 326 360 394 462 428 but low resolution (quadrupole) and MS/MS (ion trap) have also been used for samples containing high levels of the target compounds (25). In addition, biological detection of dioxins and dioxin-like 324 358 426 392 460 compounds includes reporter gene assays (CALUX, P450-based cell lines) and immunoassays (Ah- or enzyme-based). The reporter gene assays have been shown to be especially helpful for screening large 322 356 458 424 390 numbers of samples (26). However, bioassays and chemical analysis can result in two completely different numbers. Whereas bioassays measure the overall dioxin-like toxicity, chemical analysis spe422 456 cifically measures the toxicity of the 17 PCDD/Fs 320 354 388 and the 12 dioxin-like PCBs. In addition to the improvements to instrumentation, the synthesis of 13C-labeled internal standards 25 30 35 40 min has improved the quality of analysis. All 17 2,3,7,8Tetra CDD Penta CDD Hexa CDD Hepta CDD Octa CDD substituted PCDD/Fs and PCBs with TEFs assigned by the World Health Organization (WHO) FIGURE 2. The original chromatogram of dioxins on municipal solid waste fly ash. are now available as 13C-marked internal standards (Adapted with permission from Ref. 35.) that are sufficiently pure for use as sample, internal, or recovery standards. 13C-labeled standards are assumed to first found in The Netherlands in 1977 (35). This discovery behave chemically like native dioxins during extraction and was made by using low-resolution GC/MS with a packed colcleanup, but they can be differentiated by MS. Quantification umn. Figure 2 is the first chromatogram of dioxin on fly ash, is done by using the internal standard/isotope dilution method showing the different dioxins and furans at each chlorination and calculating relative response factors (RRFs) within a set level. A lot has changed as a result of improved capillary colvalue (15%) from calibration curves containing the 13C-labeled umns and high resolution. Fly ash from the 1977 batch was standards and all target compounds with at least five concen- analyzed again in 2002 by HRGC/HRMS on a capillary trations. The RRFs are then used to calculate the amount in DB-5MS column. Figure 3 shows the tetra dioxin isomers tothe sample; this way, the final results are automatically adjusted gether with the 13C-labeled internal and recovery standards. J u n e 1 , 2 0 0 8 / A n a l y t ica l C h e m is t r y
3959
The improvement in resolution and identification of the toxic 2,3,7,8-substituted isomer is obvious.
Worldwide laboratory intercomparison
erlands) and tested for homogeneity by analyzing the carbon content or the concentration of several heavy metals (cadmium, copper, lead, zinc). Since 1999, certified standard solutions used as unknown samples have been supplied by Wellington Laboratories (Canada). The fly ash samples and extracts are prepared at different locations, and homogeneity is tested by analyzing carbon content or heavy-metal concentrations. Complete dioxin analysis of the samples is not feasible from practical and economic points of view. No information on the levels or interferences is given. On one occasion, when the original 1977 fly ash was
In the 1980s and early 1990s, the analysis of dioxins was developed and refined in the research environment, but although international laboratory comparisons were recognized as valuable tools, only a limited number of laboratories were capable of taking part in such studies for complex matrices. The first intercalibration studies were performed on fish, fly ash, and human blood samples (36–40). The maximum number of participants was 15. However, in the 100 late 1990s, the number of di2,3,7,8-TCDD % oxin laboratories worldwide ex0 panded—in Japan alone, >100 100 laboratories were eventually es% tablished. Starting with 10 lab0 100 oratories (2 European, 1 U.S., C - 2,3,7,8-TCDD % 7 Japanese) in 1992, the study 100 0 soon grew to >100 participants, 100 and currently >200 laboratories % are in the program. In 2000, 0 Time 24.5 25.5 26.5 27.5 studies began on feed and foodstuffs after the Belgian food crisis (when large amounts of dioxins originating from illegal disposal of PCB oil were found % in chicken eggs) and consequent regulation (41). Typically, three or four real samples and one to three unknown standard solutions were sent out annually. The number of target compounds began with only the 17 dioxins, and 3 non-ortho pla0 Time nar PCBs were added in 1994. 22.5 27.5 32.5 37.5 42.5 47.5 TCDD PeCDD HxCDD HpCDD OCDD The WHO TEF added monoortho PCBs in 1999. In total, >150,000 data were submitted FIGURE 3. The chromatogram of the fly ash sample used in Figure 2 on a DB-5MS column. by all participants from 1992 to 2007. This unique database provided excellent material for an used, the participants were informed of the relatively high levin-depth study with the objective of determining whether the els of dioxins before the analysis. quality of analysis had improved during that 15-year period. All samples are shipped by international carrier in accordance with all national and international safety regulations in a Long-term QA/QC special iron container filled with absorbent material. The shipSince 1997, an ISO/IEC guide has existed for the organiza- ment is traceable online. Although stricter security and more tion of intercalibration proficiency testing (42); this standard paperwork have been required since 2001, the samples arrive at is based on the international harmonized protocol for pro- the laboratories in 2–5 days. Results of all target compounds ficiency testing of analytical laboratories, which has recently are submitted directly into a database by a set deadline within been revised (43). Most programs follow this protocol with 2–3 months of arrival. After the results are inspected, only slight modifications. Laboratories treat the samples as routine transcription errors and calculation mistakes with a plausible and use their own analytical protocol and spiking and stan- explanation are accepted, and the new data points are placed dard solutions. According to the harmonized protocol, soil, into the database. Because of the complexity, cost, and time sediment, and sludge samples for the international intercalibra- involved in performing a full chemical dioxin analysis, it is nortion study have been prepared by the Wageningen Evaluating mally not feasible to analyze duplicate samples. Interlaboratory Programmes for Analytical Laboratories (WEPAL; The Neth- precision is tested occasionally by using the same sample within 13
3960
A n a l y t ica l C h e m is t r y / J u n e 1 , 2 0 0 8
12
Iz-ScoreI
RSD (%)
Participants
one round, without the partici120 140 pants having any prior knowl■ ■ 114 edge. Long-term reproducibil111 ■ ity is evaluated by reusing the 120 ■ 100 103 same sample in different years, 101 ■ 96 again without the participants’ Number of participants 100 prior knowledge. 80 From the raw data, a consen■ 75 ■ sus value is determined by cal77 ■ 80 71 culating a raw mean value from all entries with the PCDD/F 60 RSD ash TEQ. Data outside twice the 60 calculated RSD are considered 45 ■ outliers and are removed. The 40 data are visually inspected, 40 RSD soil and a new mean and RSD are 30 ■ calculated when the data are 20 relatively normally distributed. 20 This new RSD is now used ■9 to calculate the z-score for all RSD standard RSD extract participants. The RSDs, after 0 0 removal of outliers throughout 1994 1996 1998 1999 2000 2001 2002 2003 2004 2005 2006 the years, give an indication of the analytical quality of all the FIGURE 4. The average RSD by year after removal of outliers (±2 RSD). laboratories. Figure 4 provides a summary of this data for the dioxin was seen throughout the years (1994–2006), and the difficulty TEQ from 1994 to 2006. The error bars represent the mini- level of the samples had more influence on the results than mum and maximum RSD for each sample type and each year. the dioxin levels did. Samples in the same concentration range The RSD for the standard solution initially distributed in showed very different RSDs depending on the complexity and 2000 varied 8–17% for all laboratories after removal of the interferences. The soil and sediment samples, especially the outliers. The RSD of the extracts distributed in 1994–2000 sewage sludge and ash, showed (as expected) a larger RSD than varied 11–22%. The RSD for the real soil/sediment samples less complex samples such as extracts and standard solutions. distributed in 1998–2006 was 9–53% and for the ash samples, 17–115%. Fit-for-purpose criteria The combination of the RSD data and the number of par- The data from all the laboratories (and excluding outliers) ticipants can have a negative effect on the quality of the data. could not be used to obtain “fit-for-purpose” criteria for the In the period 1994–99, the number of laboratories expanded different types of samples. Such criteria are based on a level of quite rapidly, predominantly as a result of new laboratories be- uncertainty that is determined by the analytical capabilities of ing established in Asia. A simi4.0 lar rise was seen again starting in 2001, triggered by the Bel3.5 gian food crisis. A large number of inexperienced laboratories 3.0 started doing dioxin analysis 2.5 and joined the study. It is quite likely that this influenced the 2.0 overall quality and RSDs. After these laboratories became more 1.5 experienced and proficient, the total RSD became smaller again 1.0 in 2004–06. This trend seems 0.5 to hold for the preliminary 2007 data, which are currently 0.0 being evaluated. 1998 1999 2000 2001 2002 2003 2004 2005 2006 No clear relation between the concentration and the RSD FIGURE 5. Absolute z-scores of three selected laboratories. J u n e 1 , 2 0 0 8 / A n a l y t ica l C h e m is t r y
3961
the laboratories and the usage of the results (for example, legislation). The results were biased by the large number of new laboratories joining the study at different times and by the relatively weak criteria used to delete outliers that had been in place since 1992. Therefore, the results of 18 laboratories that had been successfully participating in the study since 1994 were selected and further analyzed. All laboratories used HRGC/HRMS systems. Calculating the RSD based on the data from these laboratories should result in a better selection of fit-for-purpose criteria. The lowest RSDs of 9% (SD 4%, n = 10) were achieved for the unknown standard solutions, which did not need any cleanup or extraction, followed by sample extracts with an RSD of 15% (SD 4%, n = 13), which needed cleanup before GC/MS analysis. For the 29 soil and sediment samples, the overall RSD of the 18 labs was 14% (SD 7%, n = 29). The ash samples (often pretreated with acid before extraction) exhibited the highest overall RSD of 19% (SD 8%, n = 25). All RSDs were
with theory and practical experience and are in line with current EU legislation.
z-Scores
Two common approaches for the validation of intercalibration studies use z-scores. In the first approach, z-scores are based on the deviation from the mean or median consensus value in units of the SD, so z = •xi − X•/SD, in which xi is the reported value and X is the consensus value or mean value; z-scores are based on the SD of all participants after obvious outliers are deleted. The SD is dependent on the participants’ data and reflects the quality of the specific analysis at a certain time. The other approach is to calculate z-scores by z = •xi − X •/σ, in which σ is set to 10–30% of the concentration, depending on the application. This way, z-scores are independent of the variation between laboratories. The first approach is often useful for newly developed methodology, whereas the second is often used for more established methods. Interestingly, after >15 years, the Table 3. Calculated RSDs over the concentration range 1–10,000 ppb. two approaches result in very similar zscores. This is because the RSDs (TaAll labs1 Expert labs Hr Thompson Eppe Fit for purble 3) of both theoretical approaches, (%; 44) (%; 45, 46) (%; 47) pose (%) RSD (%) SD RSD (%) SD fit-for-purpose approaches, and the real data are very close, especially in Standard 11 3 9 4 >45 22 5–15 10 the later years of the study (41, 45, Extract 15 4 15 4 >45 22 5–15 15 47). Soil/sediment 21 12 14 7 >45 22 5–15 20 Figure 5 shows (for three expert Ash 35 19 19 8 >45 22 5–15 20 laboratories) that the individual z1After removing statistical outliers. scores are very useful for validating the performance of analytical laboracalculated over a concentration range of 0.003–50 ng TEQ/g. tories. Approximately 95% of the absolute z-scores should be The RSDs were significantly lower than the theoretical value 50 interlaboratory studies on trace analyses unacceptable and should be investigated. All three laboratories of several compounds, including pesticides, drug formulations, performed well, with z-scores generally 2 ship between dioxin levels and RSD was found in the low during 2003, with one extreme value >3; action was taken, and parts-per-billion range. The data are in better agreement with in subsequent years the z-scores returned to satisfactory. Identhe Thompson function, an improved version of Hr, which tical ash and soil and sediment samples distributed in three or includes more recent interlaboratory data on low levels of my- more successive years without the laboratories’ prior knowlcotoxins in food (45). Hr is a good predictor of performance edge also showed that long-term reproducibility for all three of laboratory comparison studies in a concentration range of laboratories was good for both the total TEQ (