Environ. Sci. Technol. 2007, 41, 1653-1661
Multivariate Chemical Mapping of Antibiotics and Identification of Structurally Representative Substances E S T E R P A P A , ‡,§ J E R K E R F I C K , † RICHARD LINDBERG,† MAGNUS JOHANSSON,| PAOLA GRAMATICA,‡ AND P A T R I K L . A N D E R S S O N * ,† Department of Chemistry, Environmental Chemistry, Umeå University, SE-901 87 Umeå, Sweden, and Department of Structural and Functional Biology, QSAR in Environmental Chemistry and Ecotoxicology Research Unit, University of Insubria, Via J.H. Dunant 3 - 21100 Varese (Italy), and Astra Zeneca R&D, Mo¨lndal, SE-431 83, Sweden
Antibiotics used in human and veterinary medicine have been found in samples from diverse environments in many parts of the world. To assess the environmental risks associated with them, data regarding their toxicity, occurrence, and fate are needed, but gathering such data is time-consuming and expensive. An efficient approach to address these difficulties would be to select a small subset of antibiotics with a wide variation in chemical characteristics, perform experimental tests on this subset, and then extrapolate the results to larger numbers of antibiotics, including the most potentially hazardous compounds. To assess the potential utility of such an approach, a set of 92 antibiotics for human use was studied and their structural properties were described with 24 chemical descriptors that included information on their steric, lipophilic, and electronic properties. Principal component analysis in combination with statistical experimental design was used to map the chemical diversity of the antibiotics and to select a small subset, a “training set”, of 20 antibiotics. The chemical representativity of the training set was assessed in a quantitative structure-activity model established to predict ultimate biodegradation. The selected antibiotics showed to cover the chemical variation of the studied antibiotics and are suggested for use in future testing programs to assess antibiotics’ fate and effects in the environment.
Introduction Concern about the occurrence, fate, and possibly adverse effects of pharmaceutical substances in the environment is increasing. Their impact has not been elucidated in detail, but many scientists have stressed the potential risks associated with these chemicals (1-3). Substantial amounts of active pharmaceutical substances are often excreted, largely * Corresponding author phone: +46-90-7865266; fax +46-90128133; e-mail
[email protected]. † Umeå University. ‡ University of Insubria. § Visiting Ph.D. Student at Umeå University. | Astra Zeneca R&D. 10.1021/es060618u CCC: $37.00 Published on Web 01/25/2007
2007 American Chemical Society
unchanged, in urine and feces (4) and many have been detected in sewage treatment plant effluents, including various antibiotics (1-3, 5-6), estrogens (2, 7), and antiinflammatory drugs (2-3, 8). Antibiotics, defined here as pharmaceuticals used to prevent or treat infectious diseases, are used throughout the world in both human and veterinary medicine. The environmental risk associated with antibiotics is due to their potency to develop and/or maintain antibiotic resistant micro organisms (9), and their toxic effects (1011). The global consumption of antibiotics (as active substances) amounts to 100 000-200 000 tonnes per year (12) and the annual consumption in Sweden has been estimated to approximately 100 tonnes. Many active antibiotic pharmaceuticals have been found in matrices such as hospital effluents (10, 13), sewage treatment water and sludge (1, 6), streams (2, 12) and manure (14). To conduct accurate environmental risk assessments, data on measured or predicted levels of antibiotics and their ecotoxicological effects are needed, but information on their fate in the environment and their effects on ecosystems is currently scarce. Hence, appropriate tools are needed to accurately measure or predict pharmaceutical levels, and to investigate their fate and long-term toxicity in the environment. When the Swedish Medical Products Agency conducted environmental risk assessments of 30 pharmaceuticals, only their acute toxicity was considered due to a lack of relevant data (15). However, acute toxicity is of minor importance at measured environmental levels compared to long-term exposure to sub-therapeutic concentrations, together with possible synergistic effects (16, 17). An approach to facilitate risk assessment and screening of fate and effects of large groups of compounds is to develop quantitative structure-activity models (QSARs), which are mathematical expressions relating descriptors encoding the molecular characteristics of the chemicals and a response. A systematic chemometric scheme for development of predictive QSARs involves, as a first step, a mapping of the chemical variation of the studied chemicals followed by a selection strategy to identify training set compounds (1821). A constructed chemical map of the target chemicals formed using multiple molecular descriptors and, e.g., principal component analysis (PCA) (22) constitutes a sound basis for selection of structurally diverse chemicals. These should serve as representatives (training set) of the studied chemical domain. Selection of training sets can be performed applying statistical experimental design tools, such as factorial design or D-optimal design (18, 19). The number of training sets and their size depend on the chemical diversity of the studied chemicals and the specificity of the assessed property. If a large number of chemicals are under investigation with a considerable structural diversity or a very specific property is studied, a number of discrete training sets may be needed to cover the chemical variation. The selected chemicals are subsequently suggested to be included in national and international testing programmes to assess critical environmental properties for improvement of future risk assessments. Furthermore, these compounds and their estimated properties should be used as objects in development of QSAR models. By using these objects as a training set, QSARs can be developed and applied to predict properties for structurally similar non-tested compounds based on their molecular properties. This multistep approach has recently been tested and applied for various diverse organic compounds (21, 2325). In this study the described strategy has been applied for antibiotics, a highly environmental relevant group of chemicals which, in principle, lacks data on fate and effects. The VOL. 41, NO. 5, 2007 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
1653
TABLE 1. Antibiotics Included in the Study (n ) 92) no.
name
codea
CASRNb
no.
name
codea
CASRNb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
Doxycycline Lymecycline Oxytetracycline Tetracycline Chloramphenicol Clavulanic acid Amoxicillin Ampicillin Bacampicillin Mecillinam Piperacillin Pivampicillin Pivmecillinam Benzylpenicillin Phenoxymethylpenicillin Cloxacillin Dicloxacillin Flucloxacillin Tazobactam Cefadroxil Cefalexin Cefixime Cefotaxime Cefoxitin Cefpodoxime Ceftibuten Ceftriaxone Cefuroxime Cilastatin Imipenem Loracarbef Aztreonam Meropenem Ertapenem Sulfadiazine Sulfamethoxazole Trimethoprim Azithromycin Clarithromycin Clindamycin Erythromycin Roxithromycin Quinupristin Dalfopristin Amikacin Gentamicin
J01AA02 J01AA04 J01AA06 J01AA07 J01BA01 J01CR02 J01CA04 J01CA01 J01CA06 J01CA11 J01CA12 J01CA02 J01CA08 J01CE01 J01CE02 J01CF02 J01CF01 J01CF05 J01CG02 J01DB05 J01DB01 J01DD08 J01DD01 J01DC01 J01DD13 J01DD14 J01DD04 J01DC02 J01DH51a J01DH51b J01DC08 J01DF01 J01DH02 J01DH03 J01EC02 J01EC01 J01EA01 J01FA10 J01FA09 J01FF01 J01FA01 J01FA06 J01FG02a J01FG02b J01GB06 J01GB03
000564-25-0 000992-21-2 000079-57-2 000060-54-8 000056-75-7 058001-44-8 061336-70-7 000069-53-4 050972-17-3 032887-01-7 061477-96-1 033817-20-8 032886-97-8 000061-33-6 000087-08-1 000061-72-3 003116-76-5 005250-39-5 089786-04-9 050370-12-2 015686-71-2 079350-37-1 063527-52-6 035607-66-0 087239-81-4 097519-39-6 073384-59-5 055268-75-2 082009-34-5 064221-86-9 076470-66-1 078110-38-0 096036-03-2 153832-46-3 000068-35-9 000723-46-6 000738-70-5 083905-01-5 081103-11-9 018323-44-9 000114-07-8 080214-83-1 120138-50-3 112362-50-2 037517-28-5 001403-66-3
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92
Netilmicin Tobramycin Ciprofloxacin Levofloxacin Moxifloxacin Norfloxacin Ofloxacin Fusidic acid Metronidazole Tinidazole Nitrofurantoin Linezolid Methenamine Amphotericin B Ketoconazole Fluconazole Itraconazole Voriconazole Flucytosine Rifabutin Rifampicin Isoniazid Ethambutol Pyrazinamide Aciclovir Famciclovir Ganciclovir Ribavirin Valaciclovir Valganciclovir Amprenavir Indinavir Nelfinavir Ritonavir Saquinavir Abacavir Didanosine Lamivudine Stavudine Tenofovir disoproxil Zalcitabine Zidovudine Efavirenz Nevirapine Oseltamivir Zanamivir
J01GB07 J01GB01 J01MA02 J01MA12 J01MA14 J01MA06 J01MA01 J01XC01 J01XD01 J01XD02 J01XE01 J01XX08 J01XX05 J02AA01 J02AB02 J02AC01 J02AC02 J02AC03 J02AX01 J04AB04 J04AB02 J04AC01 J04AK02 J04AK01 J05AB01 J05AB09 J05AB06 J05AB04 J05AB11 J05AB14 J05AE05 J05AE02 J05AE04 J05AE03 J05AE01 J05AF06 J05AF02 J05AF05 J05AF04 J05AF07 J05AF03 J05AF01 J05AG03 J05AG01 J05AH02 J05AH01
056391-56-1 032986-56-4 085721-33-1 100986-85-4 151096-09-2 070458-96-7 082419-36-1 006990-06-3 000443-48-1 019387-91-8 000067-20-9 165800-03-3 000100-97-0 001397-89-3 065277-42-1 086386-73-4 084625-61-6 137234-62-9 002022-85-7 072559-06-9 013292-46-1 000054-85-3 000074-55-5 000098-96-4 059277-89-3 104227-87-4 082410-32-0 036791-04-5 124832-26-4 175865-60-8 161814-49-9 150378-17-9 159989-64-7 155213-67-5 127779-20-8 136470-78-5 069655-05-6 134678-17-4 003056-17-5 201341-05-1 007481-89-2 030516-87-1 154598-52-4 129618-40-2 196618-13-0 139110-80-8
a ATC code as defined by WHO. ATC code; J01A, tetracyclines, J01B, amphenicols, J01C, beta-lactam antibacterials, penicillins, J01D, other beta-lactam antibacterials, J01E, sulfonamides and trimethoprim, J01F, macrolides, lincosamides and streptogramins, J01G, aminoglycoside antibacterials, J01M, quinolone antibacterials, J01X, other antibacterials, J02A, antimycotics for systemic use, J04A, drugs for treating tuberculosis, J05A, antivirals for systemic use. b Chemical Abstracts Service registry number.
major aim of the study was to map the chemical variation of antibiotics and to select structurally representative substances for future fate and toxicity testing.
Methods Dataset. In this study 97 chemicals used to treat human infectious diseases in Sweden were included. These are divided into 12 groups according to similarities in their chemical structure and pharmaceutical use, according to the Anatomical Therapeutic Chemical (ATC) classification code and include most major chemicals used to treat infectious diseases worldwide. The 12 groups are tetracyclines (ATC code J01A, n ) 4), amphenicols (ATC code J01B, n ) 1), beta-lactam antibacterials, penicillins (ATC code J01C, n ) 14), other beta-lactam antibacterials (ATC code J01D, n ) 15), sulfonamides and trimethoprim (ATC code J01E, n ) 3), macrolides, lincosamides and streptogramins (ATC code J01F, n ) 7), aminoglycoside antibacterials (ATC code J01G, n ) 4), quinolone antibacterials (ATC code J01M, n ) 5), other 1654
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 41, NO. 5, 2007
antibacterials (ATC code J01X, n ) 6), antimycotics for systemic use (ATC code J02A, n ) 6), drugs for treating tuberculosis (ATC code J04A, n ) 5) and antivirals for systemic use (ATC code J05A, n ) 22). All the chemicals included in this investigation are listed in Table 1, together with their Chemical Abstracts Services Registry numbers and individual ATC codes. Molecular Descriptors. The input files for calculating descriptors, which contained information on atom and bond types, connectivity, partial charges and atomic spatial coordinates, relative to the minimum energy conformation of the molecule, were obtained using the Molecular Mechanics method of Allinger (MM+) in the software package HyperChem (Hybercube). These conformations were further refined using the semiempirical AM1 Hamiltonian method, and the Polak-Ribiere algorithm was used to perform the calculations, with a convergence limit of 0.01 kcal/mol and a maximum number of calculation cycles set at 10 000. Quantum-chemical descriptors such as HOMO (highest
TABLE 2. Descriptors Included in the Simplified PCA Structural Analysis dragon constitutional descriptors AMW (average molecular weight) Me (mean atomic Sanderson electronegativity) Ms (mean electrotopological state) Mv (mean atomic van der Waals volumes) MW (molecular Weight) nAB (number of aromatic bonds) nBM (number of multiple bonds) nBnz (number of benzene-like rings) nCIC (number of rings) nDB (number of double bonds) nN (number of Nitrogen atoms) nO (number of Oxygen atoms) nR04 (number of 4-membered rings) nR06 (number of 6-membered rings) nS (number of Sulfur atoms)
occupied molecular orbital), LUMO (lowest unoccupied molecular orbital), HOMO-LUMO gap, and atomic energies were calculated for all the studied compounds. The logarithm of the octanol/water partition coefficient (log Kow), calculated using the software LogKowWIN (U.S. EPA), was also included as a variable describing hydrophobic properties of the antibiotics. A total of 893 molecular descriptors of various kinds were calculated by the software DRAGON (v.5.4) (Talete, Italy). The meaning of these molecular descriptors and the calculation procedures are summarized in the software DRAGON and explained in detail, with references to relevant literature, in Todeschini and Consonni (2000) (26). Principal Component Analysis. PCA is a statistical method that is used to extract and visualize major patterns and trends in a multivariate data matrix (22). In a PCA the information in the dataset is reduced to a few new descriptive variables, principal components. These principal components describe the systematic variation in the dataset and can be used to generate graphical presentations of the chemical variation, in so-called score plots. Chemicals, here antibiotics, with similar properties are located close to each other in this graphical presentation, which simplifies interpretation of general trends and correlations between chemicals, as well as outlier detection. In addition, the variables are presented in a loading plot. This plot is correlated to the score plot and indicates which variables are important for each antibiotic. Compounds and variables that are projected onto the same positions in the score and loading plots are correlated. Several statistical values can be calculated from a PCA that indicate the validity of the generated model and the statistical significance of the results. In particular the R2X value indicates the percentage of the total variance explained by each principal component. In this study PCA was performed using Simca-P 10.0 software (Umetrics, Sweden). Statistical Experimental Design. Statistical experimental design is a very useful method for selecting chemicals to include in a training set for development of QSAR models. In statistical experimental design, several parameters can be varied simultaneously in a systematic way to obtain as much information as possible from as few experiments or observations as possible (or, often, to optimize the number of experiments that need to be performed in order to acquire sufficient information to meet specific aims within acceptable time and/or cost constraints). In a factorial design, observations are selected at high (+) and low (-) levels of each variable and in all possible combinations (27). Center points are also included to allow the statistical variation of the model, and nonlinear relationships between the chemicals and variables, to be detected and described. Center points are located at the mean of the high and low settings for each
quantum chemical descriptors HOMO LUMO HOMO-LUMO gap atomic energy binding energy core energy electronic energy heat of formation
physico-chemical property log Kow
variable, and usually three to four observations are selected. In this study, four variables represented by the scores’ values extracted from the PCA analysis were used to generate a factorial experimental design (24) to select a training set of 20 antibiotics. The compounds grouped according to the factorial design are given in the Supporting Information (Supporting Information Table SI 1). Partial Least-Squares Projections to Latent Structures. A partial least-square projection to latent structures (PLS) model is calculated in a similar way to a PCA. The difference is that a PLS model uses the variation in the selected data set (X) to correlate with the variation in a selected response matrix (Y). Calculations are used to maximize the covariance between the X and Y values to describe and predict the response Y using the data in X. In this study PLS was used to calculate quantitative structure-activity relationships (i.e., to predict the response of antibiotics based on their chemical properties). Several statistical values are generated when a PLS model is calculated to ensure that the model is valid and statistically significant. Important values for a PLS model are the explained variation in X (R2X) and Y (R2Y), the crossvalidated explained variance (Q2), the root mean squared error of the estimate (RMSEE), and the root mean squared error of the prediction (RMSEP). RMSEE gives a direct indication of the error in the model (adjusted to the scale of the model) and RMSEP gives a direct indication of the error in the predicted value (also adjusted to the scale of the model). Two further important parameters, DModX and DModY, define the objects’ distances to the model in the X or Y space, allowing the identification of structural or response outliers. All PLS models were constructed and evaluated using Simca-P 10.0 software (Umetrics, Sweden). Ultimate Biodegradability. The ultimate biodegradability (i.e., the readiness with which a parent compound can be transformed to carbon dioxide and water) was estimated for all the studied antibiotics using the Biodegradation Probability Program BIOWIN3 (U.S. EPA) (28-30). The response from BIOWIN3 was used to establish a QSAR model with the aim to verify the chemical representativity of the training set. As more data become available on various physicobiochemical properties, by including the selected antibiotics in various testing programmes, more relevant QSAR models could be developed. The biodegradation estimates from BIOWIN are based on molecular fragment constants that were developed using multiple linear and nonlinear regression analyses and expert judgments for 200 diverse chemicals. The ultimate biodegradability of each chemical is defined on a scale from 1 to 5, corresponding to the following timescales: 5 ) hours, 4 ) days, 3 ) weeks, 2 ) months, 1 ) longer. VOL. 41, NO. 5, 2007 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
1655
FIGURE 1. (a) Score plot of the first two principal components from the simplified PCA performed using 24 descriptors for the 92 antibiotics; (b) Loading plot of the first two principal components from the simplified PCA performed using 24 descriptors for the 92 antibiotics.
Results and Discussion The original study set of 97 compounds registered as antibiotics for use in Sweden 2004 was initially reduced to 92 based on their structural characteristics. Two compounds, teicoplanin and vancomycin, were excluded since they had molecular weights >1400. These were identified as outliers in the PCA in comparison with the remaining antibiotics, whose molecular weights ranged from 123 to 1022. Among the 97 antibiotics two inner salts (cefipime and ceftazidime) and a salt (sodium foscarnet) were not considered since their calculated descriptors would not be comparable with those of the other compounds. In total the considered group of antibiotics covers a wide range of chemical properties, including various functional groups, as well as aromatic and nonaromatic rings, and heterocyclic structures. In this dataset, the number of hydrogen bond donors and acceptors 1656
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 41, NO. 5, 2007
varies from 0 to 17 and 4 to 19, respectively, the span of estimated Kow values ranges 15 orders of magnitude, and the water solubility spans from sparingly soluble to very soluble. This chemical variation was studied in more detail using a multitude of chemical descriptors and PCA. PCA on the Complete Descriptor Set. A first PCA (the complete PCA) was obtained for the 92 compounds using all 901 descriptors calculated by the software HyperChem, DRAGON and LogKowWIN, to describe their chemical variation. The first two principal components explained 59% of the total variation (Supporting Information, Figure SI 1). This PCA synthesizes the total structural variation among the antibiotics and groups them according to their pharmaceutical uses (macrolides, aminoglycosides, drugs for treating tuberculosis and some sub-groups of antivirals). However, the numbers of descriptors (901) considered in
TABLE 3. Molecular Structures of the 20 Selected Antibiotics
VOL. 41, NO. 5, 2007 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
1657
TABLE 3. (Continued)
this model makes structural interpretation difficult. Therefore, to simplify interpretation without losing the most significant information, separate PCA models were developed, using the descriptors as input variables grouped according to their type, their ability to describe 1-D, 2-D, and 3-D structural features, or their interpretability. From these analyses a condensed set of 24 descriptors was formed including only descriptors easy to interpret and which capture the major information related to the molecules size, hydrophobicity, and electronic properties (Table 2, Supporting Information Table SI 2). PCA Based on the Simplified Descriptor Set. The PCA based on the simplified set resulted in a four-component model that explained 80% of the chemical variation in the 1658
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 41, NO. 5, 2007
data. The distribution of the compounds in the PC1-PC2 plane (Figure 1a-b) was found to be comparable to that of the complete PCA mentioned above. The specific groups of antibiotics appear to be conserved in the simplified set, despite the large reduction in the number and types of descriptors. The first PC explains 34% of the chemical variation and ranks the antibiotics principally in order of size, being strongly positively correlated with their molecular weight and core energy, but inversely related to their atomic, binding and electronic energies. Thus, the antibiotics with low molecular weights have low values along the PC1 axis and are located at the left side of the plot, while the larger antibiotics have high values along the PC1 axis and are located at the right
side of the plot. Hydrophobicity is positively correlated with both PC1 and PC2, so the most hydrophobic chemicals in the dataset tend to be located in the upper right part of the score plot. The second PC explains 24% of the original information and ranks the chemicals principally according to the presence of multiple bonds, aromatic character, and their LUMO values. This means that antibiotics with complex molecular structures including aromatic rings, high numbers of multiple bonds, and negative LUMO values have positive PC2 values. Several different groups of antibiotics can be identified in the PC1 versus PC2 plot. In the upper-central part of the graph most of the beta-lactams, quinolone derivates, tetracyclines, protease inhibitors (an antiviral sub-group), and drugs for treating tuberculosis, such as rifampicin and rifabutin, are found. Aminoglycosides, macrolides and the two antiviral sub-groups, nucleotide analogues, and neuramidase inhibitors, are positioned in the lower part of the graph. In the plot a few special compounds can be identified including quinupristin (no. 43) which has high PC1 and PC2 values. This compound has a higher molecular weight than any of the other compounds, plus a high number of aromatic and heteroaromatic rings, and double bonds. In addition, amphotericin B (no. 60), rifabutin (no. 66), rifampicin (no. 67), itraconazole (no. 63) and the macrolides have high PC1 values, principally due to their high molecular weight, while their position along PC2 is due to the presence or absence of aromatic and multiple bonds. Itraconazole (no. 63) has the highest positive PC2 value, due to the presence of both aromatic and nonaromatic rings. On the opposite side of the plot, the nonaromatic macrolides (nos. 38, 39, 41, and 42) and the aminoglycosides (nos. 45, 46, 47, 48) have low PC2 values as well as ethambutol (no. 69) and methenamin (no. 59), which not only completely lack aromaticity, but also present very special structures. Ethambutol is a noncyclic, saturated compound with a small molecular weight, whereas methenamin is not only saturated and has a low molecular weight, but also has also a special heterocyclic bridged system. The third and fourth PCs provide less significant information, explaining 14 and 7% of the variation, respectively (Supporting Information, Figure SI 2a-b). However, some antibiotic groups are still conserved in the PC3 versus PC4 plot (quinolone derivates, nucleotide analogues, macrolides, and beta-lactams) and factors reflecting the molecular complexity, such as the presence of aromatic or double bonds as well as different types of rings or sulfur and oxygen atoms, explain the distribution of the antibiotics in this space. In general, chemicals seem to be ranked along the PC3 axis according to the prevalence in the structure of double bonds (high PC3 values) (e.g., amphotericin B (no. 60)) and aromatic bonds (low PC3 values) (e.g., ketoconazole (no. 61) and itraconazole (no. 63)). Moreover, the presence of nonaromatic 6-membered rings, 4-membered rings, and sulfur atoms seems to influence the ranking of the chemicals along both the PC3 and PC4 axes, while the variables nR04 and nS appear to be highly correlated to beta-lactams (mecillinam (no. 10) and pivmecillinam (no. 13) have the lowest PC4 values), which always have at least one 4-membered ring and one sulfur atom. Training Set-Selection Process. A training set of 20 antibiotics was selected using statistical experimental design based on the PCA obtained including the simplified set of 24 descriptors. The first four principal components were used to create a 24-factorial design. This design yields 16 unique design levels based on their combinations of score values. From each level one antibiotic was selected and complemented with four from the central region of the chemical domain. The statistical selection methodology was balanced using information on the antibiotics’ consumption volumes, environmental relevance and commercial availability. The
training set thus includes antibiotics that have been detected in the environment, e.g., tetracycline (no. 4), chloramphenicol (no. 5), sulfamethoxazole (no. 36), trimethoprim (no. 37), ciprofloxacin (no. 9) and norfloxacin (no. 52) (1-3, 6, 1314). The training set also includes antibiotics that have been suggested to have specific effects in the aquatic environment, e.g., ribavirin (no. 74) (15) and ketoconazole (no. 61) (31), and compounds that are readily degraded, e.g., ampicillin (no. 8) (6). Data on consumption volumes of antibiotics in Sweden was also considered during the selection procedure, and if possible, high volume antibiotics were selected. It is important to note that all the 12 classes of antibiotics, as specified by their ATC codes, are represented by at least one compound. In addition, all 20 compounds are available from commercial standard suppliers. The large structural variation of the 20 selected antibiotics can be seen in Table 3; their molecular weights range from 211 to 837, and their estimated log Kow values range from -5.8 to 6.7. This selection of antibiotics should be seen as a possible starting point for screening chemical and biological parameters of a relevant group of chemicals. Further, prioritization of antibiotics is useful for guiding future laboratory analyses of their environmental properties, such as persistence, toxicity, and bioaccumulation. The second aim of this selection was to obtain a representative training set for the development of QSAR models. An initial test of the strategy is described below with a major aim to analyze the chemical representativity of the selected antibiotics but also to investigate how well these chemicals span the response range of the full set of antibiotics. Training SetsAssessment of Chemical Representativity. The chemical representativity of the training set was evaluated in a QSAR model including the 24 descriptors of the simplified set as dependent variables and the ultimate biodegradability as response. The model was established using the training set of 20 antibiotics and validated using the remaining 72 compounds (validation set). The cross-validated explained variance (Q2) of the calculated one-component model was 0.59 and the fit was 0.75 (R2Y). The ultimate biodegradability of all studied antibiotics, generated with BIOWIN, ranges from 0.01 to 3.25 and the error of the generated QSAR model was 0.35 (RMSEE) for the training set and 0.48 (RMSEP) for the validation set. The model also passed the permutation test (200 permutations), which indicates the significance of the predictive power. The training set is well distributed along the regression line covering most of the response domain (Figure 2). Only one compound in this set (tobramycin (no. 48)) has a D-ModX value slightly above the limit (95% confidence interval). The activity range for the training set is 1-3 and predictions made for antibiotics with lower or higher estimated ultimate biodegradation than this range should be considered as extrapolations. Quinupristin (no. 43) and itraconazole (no. 63) are estimated by BIOWIN to have an ultimate biodegradation of longer than months. These compounds have bulky structures and a high number of diverse ring-structures (heterocycles, and both nonaromatic and aromatic rings). In general, the model indicated that the most significant chemical characteristics are chemical size and the presence of aromatic bonds. Small compounds and those with few aromatic structures are expected to be readily degradable; while large, complex chemicals or compounds including many aromatic bonds are expected to be less degradable. In the validation set 8 (nos. 27, 43, 45, 48, 59, 60, 63, 69, and 80) of the 72 chemicals have DModX values above the limit (95% confidence interval). This means that 90% of the chemicals of the validation set are well represented by the training set compounds. Large deviations from the model were recorded for only a few compounds, viz., amphotericin B (no. 60) and voriconazole (no. 64), which are antimycotics for systemic VOL. 41, NO. 5, 2007 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
1659
FIGURE 2. BIOWIN estimates versus QSAR predicted biodegradability values. 5 ) hours, 4 ) days, 3 ) weeks, 2 ) months and 1 ) longer. use (J02A). These deviations between the BIOWIN estimate and the QSAR model can be attributed to the aromatic fluorines of voriconazole (no. 64) with a large impact on the BIOWIN estimate and a large molecular weight and complex structure for amphotericin B (no. 60). The QSAR model based on the simplified descriptor set is not tailored for these characteristics and is most likely due to the low precision of these predictions. The BIOWIN models are frequently used in regulatory contexts for classifying the persistence of chemicals (29, 32). Recently, the validity of the six BIOWIN models was assessed on new industrial chemicals and a combination of BIOWIN3 and BIOWIN6 was found to yield the highest score for overall predictivity (29). Here BIOWIN3 was used for ranking the antibiotics, which is a method based on expert judgments but not tailored for predicting biodegradation of antibiotics. The set of 200 chemicals used by the experts for calibrating the model covered a large chemical variation and included a number of pharmaceuticals. According to agreed procedures within the EU BIOWIN3 can be used to classify chemicals as not-readily biodegradable chemicals if their scores are below 2.2 (32). By applying this cutoff for the antibiotics 50 of the studied antibiotics were defined as readily degradable and 42 as not-readily. Data on biodegradation of antibiotics are limited and it is thus difficult to validate the predictions from the BIOWIN estimates as well as the model. Alexy et al. found that none of the 18 studied antibiotics were readily biodegradable as assessed applying the OECD 301D protocol (33). The range of BIOWIN scores of the tested compounds were, however, only 1.5-3 and thus only limited biodegradation is to be expected as this simple closed bottle test is fixed to 28 days of degradation. The predicted high persistency (BIOWIN score 1.2) of erythromycin (no. 41) has recently been experimentally confirmed in aquaculture sediments where a full mineralization was reached after more than 150 days (34). Some of the most prescribed classes of antibiotics were ranked as 1660
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 41, NO. 5, 2007
moderately persistent, e.g., the aminoglicosides (no. 4548), trimethoprim (no. 37) as well as the tetracyclines (no. 1-4) and quinolone antibacterials (no. 49-53). These results correlate with data on residue levels of these compounds detected in various environmental samples (2, 3, 6). In comparison beta-lactams which also are prescribed in high quantities but are predicted to be less persistent were not detected in sewage water effluents (6, 33). These findings along with the BIOWIN estimates stress the importance of experimental studies on the fate and ecological effects of antibiotics. The established QSAR model indicated that the selected training set of 20 compounds represents the chemical variation of the studied antibiotics. These compounds are suggested to be included in future screening programs to assess the environmental impact of a highly relevant class of chemicals.
Acknowledgments We thank the Swedish Medical Products Agency for financial support and Apoteket AB, Sweden, for helpful discussions. We also thank the University of Insubria for financial support for Ester Papa during her period of study at the Department of Chemistry, Environmental Chemistry, Umeå University.
Supporting Information Available Additional raw data and plots from the PCA and the PLS models. This material is available free of charge via the Internet at http://pubs.acs.org.
Literature Cited (1) Golet, E. M.; Alder, A. C.; Hartmann, A.; Ternes, T. A.; Giger, W. Trace determination of fluoroquinolone antibacterial agents in urban wastewater by solid-phase extraction and liquid chromatography with fluorescence detection. Anal. Chem. 2001, 73, 3632-3638. (2) Kolpin, D. W.; Furlong, E. T.; Meyer, M. T.; Thurman, E. M.; Zaugg, S. D.; Barber, L. B.; Buxton, H. T. Pharmaceuticals,
(3) (4) (5)
(6)
(7)
(8)
(9)
(10)
(11) (12) (13)
(14)
(15)
(16) (17) (18) (19)
hormones, and other organic wastewater contaminants in U.S. streams, 1999-2000: A national reconnaissance. Environ. Sci. Technol. 2002, 36, 1202-1211. Daughton, C. G.; Ternes, T. A. Pharmaceuticals and personal care products in the environment: Agents of subtle change. Environ. Health Persp. 1999, 107, 907-38. Martindale: The Complete Drug Reference, 34th ed.; Sweetman, S. C., Ed.; Pharmaceutical Press (publications division of the Royal Pharmaceutical Society of Great Britain): London, 2004. McArdell, C. S.; Molnar, E.; Suter, M. J.-F.; Giger, W. Occurrence and fate of macrolide antibiotics in wastewater treatment plants and in the Glatt valley watershed, Switzerland. Environ. Sci. Technol. 2003, 37, 5479-5486. Lindberg, R. H.; Wennberg, P.; Johansson, M. I.; Tysklind, M.; Andersson, B. A. V. Screening of human antibiotic substances and determination of weekly mass flows in five sewage treatment plants in Sweden. Environ. Sci. Technol. 2005, 39, 3421-3429. Ko¨rner, W.; Bolz, U.; Su ¨ssmuth, W.; Hiller, G.; Schuller, W.; Hanf, V.; Hagenmaier, H. Input/output balance of estrogenic active compounds in a major municipal sewage plant in Germany. Chemosphere 2000, 40, 1131-1142. Carballa, M.; Omil, F.; Lema, J. M.; Llompart, M.; Garcia-Jares, C.; Rodriguez, I.; Gomez, M.; Ternes, T. Behaviour of pharmaceuticals, cosmetics and hormones in a sewage treatment plant. Water Res. 2004, 38, 2918-2926. Ash, R. J.; Iverson, J. L. Antibiotic and Disinfectant Resistant Bacteria in Rivers of the United States. Presented at NGWA 4th International Conference on Pharmaceuticals and Endocrine Disrupting Chemicals in Water. Minneapolis, Minnesota U.S, October 13-15, 2004. Hartmann, A.; Golet, E. M.; Gartiser, S.; Alder, A. C.; Koller, T.; Widmer, R. M. Primary DNA damage but not mutagenicity correlates with ciprofloxacin concentrations in German hospital wastewaters. Arch. Environ. Contam. Toxicol. 1999, 36, 115119. Robinson, A. A.; Belden, J. B.; Lydy, M. J. Toxicity of fluoroquinolone antibiotics to aquatic organisms. Environ. Toxicol. Chem. 2005, 24, 423-430. Ku ¨ mmerer, K. Significance of antibiotics in the environment. J. Antimicrob. Chemother. 2003, 52, 5-7. Lindberg, R.; Jarnheimer, P.-Å.; Olsen, B.; Johansson, M.; Tysklind, M. Determination of antibiotic substances in hospital sewage water using solid phase extraction and liquid chromatography/mass spectrometry and group analogue internal standards. Chemosphere 2004, 57, 1479-1488. Haller, M. Y.; Mu ¨ ller, S. R.; McArdell, C. S.; Alder, A. C.; Suter, M. J. -F. Quantification of veterinary antibiotics (sulfonamides and trimethoprim) in animal manure by liquid chromatographymass spectrometry. J. Chromatogr. A 2002, 952, 111-120. Swedish Medical Product Agency (MPA) report, Environmental Effect of Pharmaceuticals and Cosmetic and Hygiene Products (Miljo¨påverkan från la¨kemedel samt kosmetiska och hygieniska produkter); MPA: Uppsala, Sweden, 2004; (in Swedish). Fent, K.; Weston, A. A.; Caminada, D. Ecotoxicology of human pharmaceuticals. Aquat. Toxicol. 2006, 76, 122-159. Flaherty, C. M.; Dodson, S. I. Effects of pharmaceuticals on Daphnia survival, growth, and reproduction. Chemosphere 2005, 61, 200-207. Eriksson, L.; Johansson, E. Multivariate design and modeling in QSAR. Chemom. Intell. Lab. Syst. 1996, 34, 1-19. Eriksson, L.; Johansson, E.; Mu ¨ ller, M.; Wold, S. On the selection of the training set in environmental QSAR analysis when compounds are clustered. J. Chemom. 2000, 14, 599-616.
(20) Golbraikh, A.; Tropsha, A. Predictive QSAR modeling based on diversity sampling of experimental datasets for the training and test set selection. J. Comput. Aid. Mol. Des. 2002, 16, 357-369. (21) Gramatica, P.; Pilutti, P.; Papa, E. Validated QSAR prediction of OH tropospheric degradation of VOCs: Splitting into trainingtest sets and consensus modelling. J. Chem. Inf. Comput. Sci. 2004, 44, 1794-1802. (22) Jackson, J.E. A User’s Guide to Principal Components; John Wiley & Sons: New York, 1991. (23) Andersson, P. L.; Van der Burght, A. S. A. M.; Van den Bergh, M.; Tysklind, M. Multivariate modeling of polychlorinated biphenyl-induced CYP1A activity in hepatocytes from three different species: Ranking scales and species differences. Environ. Toxicol. Chem. 2000, 19, 1454-63. (24) Andersson, P. L.; Maran, U; Fara, D; Karelson, M; Hermens, J. L. M. General and class specific models for prediction of soil sorption using various physicochemical descriptors. J. Chem. Inf. Comput. Sci. 2002, 42, 1450-1459. (25) Larsson, A; Johansson, S. M. C.; Pinkner, J. S.; Hultgren, S. J.; Almqvist, F; Kihlberg, J.; Linusson, A. Multivariate design, synthesis, and biological evaluation of peptide inhibitors of FimC/FimH protein-protein interactions in uropathogenic Escherichia coli. J. Med. Chem. 2005, 48, 935-945. (26) Todeschini R.; Consonni V. Handbook of Molecular Descriptors; Wiley-VCH: Weinheim, Germany, 2000. (27) Box, G. E. P.; Hunter, W. G.; Hunter, J. S. Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building; John Wiley & Sons: New York, 1978. (28) Boethling, R. S.; Howard, P.; Meylan, W.; Stiteler, W.; Beauman, J.; Tirado, N. Group contribution method for predicting probability and rate of aerobic biodegradation. Environ. Sci. Technol. 1994, 28, 459-465. (29) Posthumus, R.; Traas, T. P.; Peijnenburg, W. J. G. M.; Hulzebos, E. M. External validation of EPIWIN biodegradation models. SAR QSAR Environ. Res. 2005, 16, 135-148. (30) Howard, P. H.; Boethling, R. S.; Stiteler, W. M.; Meylan, W. M.; Hueber, A. E.; Beauman, J. A.; Larosche, M. E. Predictive model for aerobic biodegradability developed from a file of evaluated biodegradation data. Environ. Toxicol. Chem. 1992, 11, 593603. (31) Hegelund, T.; Ottosson, K.; Rådinger, M.; Tomberg, P.; Celander, M. C. Effects of the antifungal imidazole ketoconazole on CYP1A and CYP3A in rainbow trout and killifish. Environ. Toxicol. Chem. 2004, 23, 1326-1334. (32) Technical Guidance Document in support of Commission Directive 93/67/EEC on risk assessment for new notified substances and Commission Regulation (EC) no. 1488/94 on risk assessment for existing substances and Directive 98/8/EC of the European Parliament and of the Council concerning the placing of biocidal products on the market, Part 3. European Chemicals Bureau, Ispra, Italy 2003. (33) Alexy, R.; Ku ¨ mpel, T.; Ku ¨ mmerer, K. Assessment of degradation of 18 antibiotics in the Closed Bottle Test. Chemosphere 2004, 57, 505-512. (34) Kim, Y-H.; Pak, K.; Pothuluri, J. V.; Cerniglia, C. E. Mineralization of erythromycin A in aquaculture sediments. FEMS Microbiol. Lett. 2004, 234, 169-175.
Received for review March 16, 2006. Revised manuscript received November 21, 2006. Accepted December 19, 2006. ES060618U
VOL. 41, NO. 5, 2007 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
1661