Environ. Sci. Technol. 2004, 38, 6767-6774
Development of a DNA Microarray Chip for the Identification of Sludge Bacteria Using an Unsequenced Random Genomic DNA Hybridization Method BYOUNG CHAN KIM, JI HYUN PARK, AND MAN BOCK GU* Advanced Environmental Monitoring Research Center (ADEMRC), National Research Laboratory on Environmental Biotechnology, Department of Environmental Science and Engineering, Gwangju Institute of Science and Technology (GIST), 1 Oryoung-dong, Puk-gu, Gwangju 500-712, Republic of Korea
A tool, based upon the DNA microarray chip, for the identification of specific bacteria from activated sludge, using the hybridization of genomic DNA with random probes, is described. This chip was developed using the genomic DNAs from Gordonia amarae, the natural filamentous actinomycete that causes sludge foaming and bulking, as well as a nonfilamentous floc forming bacterium (Zoogloea ramigera) and the skin pathogen Mycobacterium peregrinum without any sequence information. The sets of target probes on amine-coated glass were made from a genomic library, constructed with PCR products derived from randomly fragmented genomic DNAs extracted from pure cultures of the three strains. Initial hybridization results, when pure cultures were employed, showed the specificity of the probes as well as the resolution of the system, demonstrating the capabilities of this system to identify specific bacterial strains. The microarray was also tested for its ability to distinguish specific bacteria from among mixed bacterial communities, such as in sludge, soil, or spiked genomic DNA samples. The results showed that the probes are specific, with only mild cross-hybridization occurring in a small number of cases. Furthermore, the chip clearly discriminated the presence of all three strains when they were present alone or together within mixed samples. Moreover, using the spot intensity and DNA hybridization kinetics, the starting genomic DNA concentrations could be estimated relatively well, which would make it possible to predict the number of specific bacteria present within the test samples. Therefore, the random genomic hybridization approach, i.e., without any sequence information available for the probes, is a practical protocol for the identification of and screening for specific bacteria within any complex bacterial community from the environmental samples, such as in activated sludge, although the possibility of cross-hybridization may still exist.
Introduction The activated sludge process is an essential process for treating domestic and industrial effluents containing organic * Corresponding author phone: +82 62 970 2440; fax: +82 62 970 2434; e-mail:
[email protected]. 10.1021/es035398o CCC: $27.50 Published on Web 11/06/2004
2004 American Chemical Society
compounds in most wastewater treatment plants (1). This process consists of a mixture of general and special microorganisms in a form of a complex enrichment population (2). Although the activated sludge process is accepted as a typical process for degrading organic carbons, it has some biological problems, such as foaming, scumming, and bulking, which may seriously obstruct efficient operation in a whole wastewater treatment strategy (3). Several factors are known to affect sludge foaming and bulking, including the uncontrolled overgrowth of filamentous mycolic acidcontaining actinomycetes (mycolata). Norcardia (Gordonia) sp., Microthrix parvicella, and several other type strains have been shown to lead to bulking and foaming. Among these, Norcardia (Gordonia) amarae has been characterized the best in terms of its effect in bulking and foaming in the activated sludge process (4-7). Several treatment strategies to prevent these problems exist, such as the manipulation of the loading rate and use of a selector, chlorination, ozonation, and antifoam agents (8). However, these are not a long-term solution because various factors stimulate the growth of unfavorable filamentous bacteria during the sludge process. One bottleneck inhibiting the use of these processes in wastewater treatment plants is that no tool exists to monitor the population of specific bacteria and dynamics. Therefore, it is both essential and important to monitor the population of specific sludge bacteria to efficiently operate an activated sludge process and especially prevent bulking and foaming, even before they occur. To determine a specific bacterial population in activated sludge, many approaches have been employed and tested, such as fatty acid confirmation analysis (9), antibody assays (10), oligonucleotide probe hybridization methods (11), and morphological feature analysis (12). Although morphological analyses could easily be used to identify M. parvicella, Skermania piniformis, and other distinct bacteria, organisms such as Gordonia are difficult to discriminate with a microscopic approach (8). Both the fatty acid confirmation analysis and mycolic acid polyclonal antibody assay could be used to determine the G. amarae population in activated sludge; however, its preparation and analytical procedure are too complicated and time-consuming. Also these methods require a fatty acid profile standard or antibody for every bacterial strain to be analyzed (9, 10). Another biomolecular technique uses oligonucleotide probes based on the 16S rDNA sequences of specific bacteria. This approach enables the unambiguous and phylogenic analysis of targeted foaminginduced bacteria (11). The sequences of 16S rDNA probes are routinely obtained from a database, but this database is still incomplete and needs to be updated constantly for newly isolated bacteria (13). Also, the selection of rDNA probes for in situ hybridization, or other methods, should be done very carefully to prevent the inclusion of conserved regions, while in most cases the sequences from uncultured sludge bacteria are not available. While rDNA probes have various advantages, including the identification of bacterial strains without the need for an isolated organism, the wide range of foaminginduced bacteria present in activated sludge limits this tool to research only, rather than as a routine quality assurance tool. To guarantee accuracy and assurance, one would need to increase the number of different probes to make a comprehensive analysis in complex communities, such as activated sludge (8). Therefore, a high-throughput screening tool that can handle complex populations easily is needed. Recently, DNA microarray technology, which has been used to analyze global transcriptional profiling, has faced several challenges in the testing and analysis of environmental VOL. 38, NO. 24, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
6767
samples or toxicants (14-16). Recent applications of the DNA microarray in environmental studies include the detection of pathogens, environmentally functional genes, specific bacteria, viruses, etc. (14, 17-21). Therefore, in the present study, an approach is described that uses randomly fragmented genomic DNA probes (i.e., unsequenced probes) to develop a DNA microarray chip for the identification of specific bacteria in activated sludge. Using pure cultures of G. amarae, M. peregrinum, or Z. ramigera, random libraries were constructed and the inserted fragments amplified to make the target probes. Especially, this study focuses on the problem involving bacteria in the activated sludge, but for system validation with other bacteria, we included Z. ramigera, which forms flocs and is known to remove organics or heavy metals in the sludge (22), and M. peregrinum, which cause different diseases to people with skin abscesses, such as peritonitis, wound infections, and disseminate diseases (23). After the chips were fabricated and initially characterized with pure and mixed bacterial cultures, the relationship between the signal intensity and genomic DNA concentration was determined to make it possible to estimate the DNA concentration and, thus, the bacterial cell number within the sample.
Experimental Section Bacterial Samples and Genomic DNA Extraction. The bacterial strains used in this study are commonly found in activated sludge, and include G. amarae ATCC 27808T (America Type Culture Collection), Z. ramigera ATCC 19623, M. peregrinum ATCC 14467T, S. piniformis KCTC 9829 (Korea Culture Type Collection), Ralstonia eutropha ATCC 17699, Thiobacillus thioparus ATCC 23645, Paracoccus thiocyanatus KCTC 2848, and Pseudomonas putida ATCC 12633T. All strains were cultivated using the following culture guidelines. Bacterial cultures were transferred to a 2.0 mL tube with 25% glycerol and stored at -70 °C before use. Activated sludge DNA was extracted from a laboratory-scale operating sludge, which showed no foaming, scumming, or bulking events. The seed of this sludge was from the Gwangju wastewater treatment plant and was acclimated to degrade glucose-based artificial wastewater in the laboratory. For the soil genomic DNA, 0.25 mg of a soil from a closed mine was used. To achieve a more concentrated soil DNA sample, the extracted DNA was concentrated using a Microcon YM-30 (Millipore). To quantify the G. amarae population in the sludge, 106, 105, 104, 103, or 102 cfu of G. amarae was mixed with 1 mL of sludge before the DNA was extracted. Genomic DNA from all samples was extracted using a soil DNA extraction kit (Mobio) with a modified protocol using a cultured cell stock instead of soil particles. The quantity of DNA extracted was determined using a UV spectrophotometer (Ultrospec 3100 pro, Amersham Bioscience) and its quality checked using the 260 nm/280 nm ratio and by agarose gel electrophoresis. The DNA was stored at -20 °C until use. DNA Microarray Chip Fabrication. The genomic DNA from three different strains (G. amarae, M. peregrinum, and Z. ramigera) was individually purified, and each was fractionated using several pairs of restriction enzymes, i.e., EcoRI/BamHI, HindIII/XhoI, HindIII/SacII, and EcoRI/XhoI (NEB, Beverly, MA). To get fragments between about 200 and 1500 bp, size fractions were purified after agarose gel electrophoresis. The QIAquick gel extraction kit (Qiagen) was used to elute and purify the DNA fragments from the gel. The fragment pools were then ligated into the pPCR-Script Amp vector (Stratagene) and transformed into Escherichia coli DH5R (Invitrogen) to construct a random genomic library for each strain. From each set of enzyme pairs, several hundred colonies were obtained carrying a fusion plasmid with a random fragment of genomic DNA. From these, colonies were selected randomly and digested with the same 6768
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 38, NO. 24, 2004
FIGURE 1. DNA microarray chip schematic. The probes were duplicated on each chip as shown by the numbers next to the name of each strain. enzymes used in constructing them to confirm insertion of genomic DNA. With positive clones, PCR amplification was done with the T3 and T7 primer pairs, and the amplified fragments were digested with the same restriction enzyme pairs to remove the plasmid-borne flanking regions. After purification, these fragments were denatured and then printed on an amine-coated slide glass. Specific DNA probes from G. amarae, M. peregrinum, and Z. ramigera, a total of 42, 50, and 50, respectively, were spotted in duplicate. The linearized strand from the pPCR-Script Amp vector was used as a positive control (PC), an internal standard, while the bacteriophage λ PR promoter, which was amplified from plasmid pPL450 (24) using the primers 5′-AAA AAC AGG GTA CTC ATA C-3′ and 5′-CCA TAC AAC CTC CTT AGT A-3′, was used as a negative control (Figure 1). Random Priming and Hybridization. In all hybridizations, the genomic DNAs from the three reference strains G. amarae, M. peregrinum, and Z. ramigera (40 ng of DNA each in a 20 µL random priming reaction) were labeled with Cy5-dCTP (Amersham Bioscience) and the sample genomic DNAs labeled with Cy3-dCTP (Amersham Bioscience) using random priming methods with the High Prime DNA labeling kit (Roche, Switzerland). Random priming was performed using the modified protocols with a reaction time suggested by the kit supplier with 10 ng of positive control gene. First, to optimize the time needed for a full random priming reaction, samples were tested after 1, 5, 10, 30, and 60 min at 37 °C and then denatured for 5 min at 100 °C. After random priming, the reaction pools were mixed together and then eluted using a PCR purification kit (Qiagen). The purified mixture was cleaned and centrifuged with 100% ethanol (10000 rpm, 10 min, 4 °C) and then 70% ethanol (10000 rpm, 3 min, 4 °C). The purified probes were dried at room temperature for 10 min. Using 30 µL of hybridization buffer (6 × SSC, 0.2% sodium dodecyl sulfate (SDS), 5 × Denherdt’s solution, 0.1 mg/mL denatured salmon sperm DNA), the dried pellets were redissolved for chip hybridization. The hybridization mixture was boiled for 2 min at 100 °C and then injected into the gap between the chip and slide cover glass. Before hybridization, the chips were prehybridized in prehybridization solution (1.75 × SSC, 0.1% SDS, 10 mg/mL bovine serum albumin) for 30 min at 65 °C. After the chips were washed using distilled water for 1 min, followed by 100% 2-propanol for 1 min at room temperature, they were dried by centrifugation (600 rpm, 5 min, and room temperature). All hybridizations were performed for 16 h at 65 °C. Afterward, the chips were washed once with buffer 1 (2× SSC) at room temperature to remove the cover glass, then for 10 min with buffer 2 (2× SSC, 0.2% SDS), which was prewarmed to 65 °C, and then twice with buffer 3 (0.05×
TABLE 1. Percent of Positive Hybridization for Different Combinations of Test DNA Based on the Cy3 Intensitya DNA chip position GA MP ZR
set 1
set 2
set 3
set 4
set 5
100 29 0 100 100 0 100 4 4 100 0 36 98 98 100
set 6 0 4 0
set 7
set set set 8 9 10
100 0 4 4 0 98
0 4 0
0 0 0
set 11
set 12
100 0 4 4 0 98
a
The test samples are 40 ng of G. amarae genomic DNA (set 1), 40 ng of M. peregrinum genomic DNA (set 2), 40 ng of Z. ramigera genomic DNA (set 3), 40 ng each of G. amarae and Z. ramigera genomic DNA (set 4), 40 ng each of G. amarae, Z. ramigera, and M. peregrinum genomic DNA (set 5), 40 ng each of S. piniformis, R. eutropha, T. thioparus, P. thiocyanatus, and P. putida genomic DNA (set 6), test set 6 + 40 ng of G. amarae genomic DNA (set 7), test set 6 + 40 ng of Z. ramigera genomic DNA (set 8), 120 ng of genomic DNA extracted from sludge (set 9), 30 ng of genomic DNA extracted from soil (set 10), 90 ng of sludge DNA from test set 9 + 30 ng of G. amarae genomic DNA (set 11), and 90 ng of sludge DNA from test set 9 + 30 ng of Z. ramigera genomic DNA (set 12). The threshold for the positive binding to each spot and determination of the percent positive binding are described in the Experimental Section, Scanning and Data Mining. The abbreviations are GA ) G. amarae, MP ) M. peregrinum, and ZR ) Z. ramigera.
SSC) for 5 min each time, after which the chip was dried by centrifugation (1200 rpm, 3 min, room temperature). To test the effect of differing DNA concentrations on the signal intensity ratio, 240, 120, 30, 3.75, or 0.23 ng of test DNA from G. amarae or Z. ramigera was used for labeling. For the specific detection test, within a mixed genomic community, 40 ng of DNA from G. amarae or Z. ramigera was mixed with DNA purified from S. piniformis, R. eutropha, T. thioparus, P. thiocyanatus, and P. putida (each contributing 40 ng) before random priming. The other mixtures used are described in the footnote for Table 1. To validate that this chip is sensitive even when using environmental samples, we added 106, 105, 104, 103, or 102 cfu of G. amarae cells to 1 mL of sludge. After extraction of the genomic DNA from the mixed stocks, a total of 500 ng of DNA was labeled with Cy3-dCTP. Scanning and Data Mining. The microarray chips were scanned with a Genepix 4000B laser scanner (Axon Instruments, Inc.), the images were captured as a 16-bit TIFF file format with 10 µm resolutions, and intensity analysis was performed with GenepixPro 3.0 software (Axon Instruments). The intensity values of the test and reference DNAs were derived from a median value of the pixel-by-pixel intensity, while the ratio between the two was calculated and corrected, with the correction factor derived from the positive control ratio Cy5-PC/Cy3-PC. The real intensity values were calculated by subtracting the background intensity values from the pixel-by-pixel intensities obtained from the software, giving the spot intensity. Flag spots suggested by software were removed for further analysis. The hybridization signal ratio between the reference DNA and sample DNA was calculated (Cy3-test/Cy5-sample) and normalized with the positive control ratio Cy5-PC/Cy3-PC. Normalized signal ratios were calculated by multiplying Cy3-test/Cy5-sample and Cy5-PC/Cy3-PC for all spots. Also, the Cy3 intensity of individual spots was corrected by multiplication with Cy5-PC/Cy3-PC. The ratio of Cy5-PC to Cy3-PC from six individual spots on the slide glass was 1.03 ( 0.08 for all experiments. Almost all spots showed a Cy3 intensity of at least 2 times the standard deviation (SD) plus the background intensity, which was calculated using a scan image analysis program, when specific binding occurred. To determine the threshold for positive spots, spots showing signal intensity values of more than 2(SD) plus the background signal after hybridization in the Cy3 channel were considered positive for hybridization. The negative control in both the Cy3 and Cy5 showed no intensity value and was always below the threshold value of 2(SD) plus the background signal. The
percent binding as listed in Table 1 was calculated by dividing the number of probes showing positive hybridization by the total number of probes for each individual bacterium. The gradient values were calculated from the regression line plotted using the normalized intensity of the Cy5-labeled probes versus that of the Cy3-labeled probes. Microsoft Excel and SigmaPlot (SPSS) were used for all statistical calculations.
Results and Discussion Detection of Specific Bacteria. Recently, specific microbial analyses have been implemented in the fields of clinical research, food pathogen analysis, virus detection, and species analysis with DNA microarray chip technologies in which a target molecule, such as 16S rDNA, some specific genes, or genomic DNA probes, based upon sequence information, were used (21, 25, 26). In this study, a DNA microarray chip was fabricated for the identification of specific bacterial strains using random fragmented probes (i.e., without any information about the sequence of the probes). Such a technique could be used to discriminate within the genus level of bacteria, and so was applied to several strains of sludge bacteria that are known to induce operational problems in wastewater treatment plants. Initially, the effect of the priming time was investigated, to ensure the signal was strong enough for intensity analysis, and it was found that the priming reaction required at least 1 h at 37 °C (data not shown). Therefore, all the randomly primed probes were made using these conditions. To test the chip quality and response reproducibility, hybridization was done with the same quantity of genomic DNA labeled with Cy5 for the references and Cy3 for the test samples. The Cy3/Cy5 ratio was 1.04 ( 0.13 for three separate experiments for all probes, while the negative control spots showed no signals. Specific detection is defined as a result where, when bacterium “A” genomic DNA is added to the test samples, only the region corresponding to bacterium A shows a positive Cy3 intensity while the addition of other bacterial DNA results in no Cy3 intensity. After hybridization, if no binding occurs, the intensity of Cy3 should in effect be zero. Otherwise, if the Cy3 intensity were measurable, this would indicate that the labeled probes bind with the target probes on the chip. To determine the threshold for positive hybridization, we selected spots that had signal intensities of greater than 2(SD) plus the background signal after hybridization. Using 120 ng of DNA from pure cultures of the three strains, positive signals were obtained in the areas corresponding to the probes specific to each bacterium (Figure 2). For the G. amarae specific detection tests, significant Cy3 intensities were observed only in the G. amarae probe spotting regions, while no significant Cy3 intensities were seen in the Z. ramigera and M. peregrinum probe spotting regions (Figure 2A). Although some cross-hybridization occurred, i.e., Z. ramigera DNA showed 98% positive binding within the ZR (Z. ramigera) region with a 4% cross-hybridization within the MP (M. peregrinum) region, it was limited to a few spots (two in this case) or the signal was weak when compared to the responses from the spots corresponding to the target strain (Figure 2B). The Cy3 intensities (equal to the median Cy3 intensity minus the background median Cy3 intensity) of each probe were different because they are considered to be differences of the lengths of immobilized probes and labeled probes. The lowest measured Cy3 intensity in the specific regions is about 50, and the highest one is more than 17 000. As well, the restriction analysis of genomic DNA from strains within the same species, but isolated at different locales, showed different length partitioning (27). Even at the same species level, some studies show similarities in the 16S rDNA sequence does not always indicate a conservation of the genomic sequence (20, 27). This was also found with the 16S rDNA sequences of G. amarae and Z. ramigera, which VOL. 38, NO. 24, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
6769
FIGURE 2. Scanning intensity at 532 nm (Cy3). The test was done with 120 ng of G. amarae genomic DNA (A), M. peregrinum genomic DNA (B), or Z. ramigera genomic DNA (C), and the chips were hybridized under the same conditions. The spot ID number refers to the random probes from G. amarae (1-42), M. peregrinum (4392), and Z. ramigera (93-142). The insets show the scanned images. have about 80% homology in their sequences (on the basis of the NCBI sequence alignment using GenBank Accession Nos. X82243 for G. amarae and X74915 for Z. ramigera). Taken together with the random hybridization results, they indicate that the use of the random fragmentation and hybridization approach presented in this study to generate specific gene probes would work, even though some cross-hybridization may be expected. In this study, more than 40 probes were used for each strain. In a previous study, Sokal and Sneath (28) suggested that around 60 characters, i.e., probes, would provide enough information for numerical taxonomy with a significant reliability in the results. Although less than 60 probes were used in this study, in tests with G. amarae and Z. ramigera, the specificity of the spotted probes demonstrates that the probes used were sufficient to differentiate and detect each of these strains. For the tests with M. peregrinum, although 100% positive binding was observed when DNA from this organism was hybridized to this chip, some cross-hybridization was seen when DNA from other strains was hybridized. Recognition of Specific Bacteria within a Mixed Population. Wastewater sludge consists of complex populations and chemistry. Extraction of genomic DNA could be biased by the presence of inhibitory factors that would extract with the DNA. Therefore, many conventional methods were studied to develop a method for the routine extraction of genomic DNA from activated sludge (29). In this study, we used a kit (soil DNA extraction kit, Mobio) to extract the genomic DNA from all samples, i.e., pure and mixed cultures, as well as from sludge or soil samples. The quality of the DNA was confirmed using the 260 nm/280 nm ratio and through agarose gel electrophoresis. Various tests were performed using the DNA from the three bacteria (i.e., G. amarae, M. peregrinum, and Z. ramigera) either individually or in a mixed sample (Table 1). For this, we used 40 ng of each strain’s DNA in the 20 µL labeling reaction. Tests with each individual strain’s DNA showed specific binding with the corresponding spots (test sets 1-3). Once again, during the M. peregrinum detection case (test set 2), some crosshybridization did occur, but the positive signals were 6770
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 38, NO. 24, 2004
observed in the M. peregrinum target region. As expected, on the basis of results with test sets 1-3, the hybridization of a mixed genomic sample, G. amarae and Z. ramigera or all three strains, showed the system was specific for only those strains represented (sets 4 and 5). Furthermore, when more complex samples were tested, the results were very clear and distinct dependent upon the bacterial genomic DNA included. For example, set 6 includes the genomic DNA from five different bacteria that are commonly found in activated sludge (8). None of these strains are included on the chip, and yet, the positive signals seen were only 4%, corresponding to two spots showing positive Cy3 signals. When this mixed genomic sample was supplemented with DNA from either G. amarae or Z. ramigera, the regions on the chips corresponding to these two bacteria gave strong responses, respectively, with 100% or 98% of the spots showing positive Cy3 signals (sets 7 and 8). The signal results obtained after hybridization with Cy3-labeled genomic DNA isolated from activated sludge, which came from an urban wastewater treatment plant and was acclimated in the laboratory for the degradation of artificial wastewater containing glucose, showed signals from only two spots (set 9). The sludge used had no foam, scum, or bulking characteristics. When 90 ng of this sludge DNA was spiked with 30 ng of either G. amarae or Z. ramigera genomic DNA, hybridization was observed in the corresponding areas of the chips (sets 11 and 12). Finally, genomic DNA extracted from a closed mine soil was also tested, but showed no hybridization (set 10). On the basis of these results, a random probe hybridization method, without any sequencing, would be sufficient to discriminate for specific sludge bacteria within a complex population. Therefore, specific identification should also be possible for one or several strains using this DNA microarray chip, while the use of more reference strains and their DNA on the chip would give a better resolution and consistency (8). Effect of Genomic DNA Concentration and Cell Number in Environmental Samples on the Signal Intensity. Using the scanned image of the chip, one could differentiate the signals using the naked eye because of the red and green dyes used, which offer a good colorimetric tool to discriminate specific binding. However, with lower DNA concentrations, lower intensities were obtained, leading to the misinterpretation of the signals when the human eye is used alone, and it is impossible to quantify the target molecules exactly. Figure 3 shows that Cy3 signals were apparent down to 30 ng of G. amarae or Z. ramigera genomic DNA, while, with 3.75 ng, no Cy3 fluorescence is obvious. It seems hybridization of the samples to the target probes did not occur. Using a scan analysis program, however, the hybridization intensities could be measured. Therefore, the raw Cy3 intensities were obtained with 240, 120, 30, 3.75, and 0.23 ng of G. amarae and Z. ramigera for both pure samples. For all cases, a scatter plot of the Cy5 intensities versus Cy3 intensities was linear (Figure 4) while the nonbinding Cy3 probes had a gradient value of zero. On the basis of the scatter plots, specific binding was observed for DNA concentrations of greater than 3.75 ng of genomics, which was the lower detection limit of the system since some of the target probes did not show any Cy3 intensity (inset in Figure 4), while no binding was seen with 0.23 ng of DNA. In a previous study, the intensity of a single spot was correlated with the applicable concentration of a specific single probe above 0.1 ng (30); however, the system presented here uses a mixed random probe, which lowers the sensitivity but expands its usefulness. The intensity of each spot differed because the hybridization efficiency of random priming and the size of the labeled and spotted probe varied. Initially, the gradient values (i.e., the average Cy3/Cy5 ratio) were calculated using all the
FIGURE 4. Scatterplot of the F635 (Cy5) intensity versus the F532 (Cy3) intensity for the various starting genomic DNA concentrations: 240 ng (closed circles), 120 ng (open circles), 30 ng (closed squares), 3.75 ng (open squares), 0.23 ng (closed triangles). All data points represent the average of three replicates, and the tests were performed in duplicate (six data sets for each spot). The data include both G. amarae and Z. ramigera hybridization cases. Some dots in the figure overlap. The inset box shows a blown up view of the 30, 3.75, and 0.23 ng hybridization results. Gradient values were calculated using a linear relationship, y ) Gx + b, where y is the F532 intensity, x is the F635 intensity, G is the gradient value, and b is the y intercept. Before calculation of the gradient values, the nonbinding and cross-hybridized probes were rejected. The gradient values were 2.84 (r2 ) 0.96, 240 ng), 0.92 (r2 ) 0.94, 120 ng), 0.57 (r2 ) 0.94, 30 ng), 0.037 (r2 ) 0.95, 3.75 ng), and 0 (no binding, 0.23 ng).
FIGURE 3. Scanned image of the G. amarae (A) and Z. ramigera (B) specific hybridization with different genomic DNA concentrations, 240, 120, 30, 3.75, and 0.23 ng, for random priming. All hybridizations were performed with reference genomic DNA (40 ng each of G. amarae, Z. ramigera, and M. peregrinum) labeled with Cy5 (red). The test genomic DNA from G. amarae or Z. ramigera was labeled with Cy3 (green). hybridized probes with a linear regression from the Cy3 and Cy5 intensity scatter plot. Next, the relationship between genomic DNA concentration and the gradient value, for all the spots, was determined. To determine the relationship between the genomic DNA concentration and gradient value, obtained via linear regression, within this study DNA dot hybridization kinetics was used. DNA dot hybridization kinetics is expressed as H ) C(1 - e-kFT), where H is the amount of hybridized DNA, C is the amount of target DNA immobilized on the supporting materials, F is the amount of target DNA for hybridization, k is a kinetic constant, and T is the hybridization time (30, 31). The gradient value represents the average ratio of the Cy3 and Cy5 intensities for the spots and is expressed as ? G ) HT/HR, where HT is the average amount of DNA hybridized from the test samples and HR is that from the reference samples. For the reference samples, the same genomic DNA pools (120 ng of G. amarae, M. peregrinum, and Z. ramigera DNA) were always used along with the same hybridization times and conditions, meaning that HR should be constant. Therefore, the average gradient value can be expressed as ? G ) C′HT ) C′(1-e-kFtT), where C′ is constant, i.e., 1/HR. As well, the values of C for the test and reference samples are the same because the immobilized spots are identical for the reference and test DNA samples.
FIGURE 5. Regression plot of the gradient values and genomic DNA concentrations. All data points represent the average of three replicates. The plot is represented by the sigmoid curve (solid line), gradient value ) 22779.3754(1 - e(-4.8724 × 10-7[genomic DNA concentration]), with an r2 value of 0.96 and P < 0.001, obtained with SigmaPlot. The average gradient value is then simply a function of Ft, where Ft is the initial concentration of the genomic DNA used in labeling. Using these relationships, the concentration of genomic DNA used in labeling and the average gradient values were regressed using a sigmoid function. Regression fitting correlated with the experimental results well, with an r2 value of 0.96 (Figure 5), showing that the application of a hybridization kinetic model for this kind of analysis is possible. Furthermore, estimation of the concentration of a specific genomic DNA is possible when using the intensity from multiple spots on the DNA microarray. Therefore, the intensity values can be used to quantify the initial genomic DNA concentration for a specific bacterium, even when present within complex communities, and offer a more concise analysis of the samples. Regression analysis can also be used as an index to estimate the genomic DNA concentration in the test samples. Considering the random priming reaction used for probe labeling, the random hexamer, which represents all sequence combinations, binds to the templates in a statistical manner and would guarantee a high degree of repeatability within strict hybridization conditions, which might result in similar gradient levels from the different bacteria for the same labeling concentration. As well, if the VOL. 38, NO. 24, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
6771
relationships between the gradient values and initial genomic DNA concentration correlate well, the signal intensity could be used to estimate the initial concentration of genomic DNA in the test sample. Further, if the DNA concentration and bacterial cell number, or mass, are correlated, it would be possible to estimate the number of bacterial cells present in the original sample. G. amarae, in particular, causes foaming and bulking when present in numbers of greater than 105-106 cfu/mL in laboratory bench-scale activated sludge (32). To validate that this chip is sensitive even when using environmental samples, we added 106, 105, 104, 103, or 102 cfu of G. amarae cells to 1 mL of sludge. After extraction of the genomic DNA from the mixed stocks, a total of 500 ng of DNA was labeled with Cy3-dCTP. Figure 6 shows the raw Cy3 intensities obtained and the gradient value for each concentration of G. amarae cells. In Figure 6A, which shows the hybridization results using only sludge DNA, i.e., no G. amarae, there was no binding of the G. amarae and Z. ramigera probes, while only two of the M. peregrinum spots showed positive signals. When G. amarae was present in the sludge, however, positive hybridization results were seen only within the G. amarae region of the chip, excluding the same two spots within the M. peregrinum region. The number of positive hybridizations was 43% (e.g., 18 out of 42 probes showed hybridization) when the G. amarae cell concentration was 104 cfu and higher, and 26% (11 out of 42) with 103 cfu. With 102 cfu, positive signals were not obtained for any of the probes (data not shown). In comparison, with a purely G. amarae genomic DNA test, 100% of the probes showed positive hybridization with 240, 120, and 30 ng of DNA and 88% with 3.75 ng of the genomic DNA (Figure 3A). The reason for this decrease in the number of positive signals was thought to be due to the probability of generating labeled probes in the random priming reaction and efficiently extracting the target genomic DNA. Although the number of positive hybridizations decreased, their gradient values could be obtained as was done with the pure genome tests. As the number of G. amarae cells mixed with the 1 mL of sludge was decreased, the gradient value also decreased for all the probes showing hybridization (Figure 6B). This could be used as a positive basis to detect target bacteria quantitatively from environmental samples. Even though the exact amount of G. amarae genomic DNA extracted from the mixture of sludge is not known, positive hybridizations were seen. Furthermore, regression fitting of the experimental results showed a good correlation between the cfu number and the gradient value, with an r2 value of 0.91 (Figure 6B), showing that the application of the hybridization kinetic model to estimate the bacterial concentration within mixed environmental samples is possible. Although a direct relationship between extracted DNA concentration and the bacterial cell number was not fully established, on the basis of the results in Figures 5 and 6B, the signals from this microarray chip could be used to quantify a specific bacterial population when it is present and at a concentration above a certain number of bacteria. This study has shown that it is possible to develop a DNA microarray chip for specific bacterial identification or the analysis of a complex bacterial community without knowing the sequence of the probes. Furthermore, it is possible to quantitatively estimate the DNA concentration of a given bacterium within the random priming reaction. Therefore, this system is an effective biomolecular tool to discriminate between different species, or genera, of bacteria within complex communities, such as activated sludge, while also providing a means of quantitatively estimating the population of the bacteria. Also, probe fabrication would take so short a time in terms of no need for a specific primer. Although some spots were cross-hybridized, as in the case with M. peregrinum, which is thought to be due to the use of 6772
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 38, NO. 24, 2004
FIGURE 6. (A) F532 (Cy3) signal intensity after hybridization with only sludge DNA, DNA from sludge with 106, 105, 104, and 103 cfu of G. amarae in 1 mL of sludge. The spot ID number refers to the random probes from G. amarae (1-42), M. peregrinum (43-92), and Z. ramigera (93-142). (B) Regression of the gradient values versus the G. amarae cfu number. The gradient values for each cell concentration were derived by plotting the F532 intensity versus the F635 intensity from the probes showing positive hybridization, as in Figure 4. Both the intensity bars in (A) and error bars in (B) are from duplicate samples on the same slide. The best fit plot is represented by a sigmoid curve (solid line) in (B), with the gradient value ) 0.1127(1 - e(-8.314 × 10-6[cell number in 1 mL of sludge]), an r2 value of 0.91, and P < 0.01, obtained with SigmaPlot. Abbreviation: GA ) G. amarae. conserved regions and might lead to a reduced detection efficiency, the results presented here show that this was not very common and that the selection of random fragments for use in the fabrication of a DNA microarray chip shows the potential of this method. One point that should be considered when using this approach is that the random selection of highly conserved regions from different bacterial strains during probe fabrication would generate nonspecific responses in the scanned results. This is due to crosshybridization of the probes by genes present in numerous bacteria. For instance, if highly conserved regions, such as the 16S ribosomal gene, are randomly selected, the specificity of the array will be reduced. In this study, however, the portion
of conserved regions selected is thought to be minimal since the genomic DNA was digested with different restriction enzyme pairs and the size of each probe was different. As well, in cases where cross-hybridization is obvious, these probes can be omitted from further fabrications of the DNA microarray chips, making the final DNA microarray chip more accurate in its detection and response capabilities; meanwhile, probe generation could be repeated with different sets or pairs of restriction enzymes. Despite their potential and the progressive studies done with the DNA microarray for environmental applications (14, 20, 33, 34), DNA microarray methods have not received much interest in field applications because of their sensitivity and the inconvenience inherent in the experimental procedure and data analysis, but have remained mainly laboratorybased methods. Furthermore, the reproducibility of the results was found to differ between laboratories, and even intralaboratory (35). In this study, using well-controlled experimental conditions, good reproducibility and performance were seen with the developed DNA microarray chip. The other major issue is the sensitivity. For application of a complex soil population, the quantity of DNA available for labeling and hybridization would be limited (30). To overcome this, the DNA extracted from the soil was initially concentrated several times before labeling. Therefore, for field applications, the efficiency of DNA extraction and the concentration of the target DNA, for detection, should be considered. Activated sludge, on the other hand, would have higher extraction efficiencies than soil because of its denser population of microorganisms. However, for both situations, a random polymerase chain reaction could be used to amplify the genomic fragments to increase the chance of detection. Finally, as the genomes of various bacteria are sequenced, the exact sequence information for numerous microorganisms will be available. The knowledge of the genetic makeup of individual bacteria will allow researchers to specifically select distinct regions and expand this technology to multiple bacterial species, or as a genus screening DNA microarray chip, suitable for their purposes on the basis of the distinguishable results of the DNA microarray chip. Until the sequences of the bacteria are known, therefore, DNA microarray chips made with random, unsequenced DNA fragments are a viable option for the identification of specific bacteria and can be used as a quality and operational control tool. Although the results described in this study are preliminary and limited to only the three bacteria used, this study is still valuable for the identification of specific bacteria in a certain bacterial community such as sludge. For instance, in the detection of unfavorable bacteria, such as G. amarae, or to check for multiple bacterial strains present in enriched bacterial communities, such as activated sludge, this DNA microarray chip can be implemented. In other words, to discriminate the presence of specific genomic DNA in mixed populations such as sludge, this is a very new approach, especially considering the lack of adequate array techniques for such application, and the coverage of this DNA microarray chip implementation would be expanded and possibly more functional for comprehensive understanding of the system if the DNA microarray chip development contains many more strains with optimization for specificity.
Acknowledgments This work was supported by KOSEF through the Advanced Environmental Monitoring Research Center (ADEMRC) at the Gwangju Institute of Science and Technology (GIST) and in part by the 2001 National Research Laboratory (NRL) program of KISTEP. We express our gratitude for this support.
Literature Cited (1) Soddell, J. A.; Seviour, R. J. A review: microbiology of foaming in activated sludge plants. J. Appl. Bacteriol. 1990, 69, 145-176. (2) Blackall, L. L.; Burrell, P. C.; Gwilliam, H.; Bradford, D.; Bond, P. L.; Hugenholtz, P. The use of 16S rDNA clone libraries to describe the microbial diversity of activated sludge communities. Water Sci. Technol. 1998, 37, 451-454. (3) Blackall, L. L. Molecular identification of activated sludge foaming bacteria. Water Sci. Technol. 1994, 29, 35-42. (4) Cha, D. K.; Jenkins, D.; Lewis, W. P.; Kido, W. H. Process control factors influencing Nocardia populations in activated sludge. Water Environ. Res. 1992, 64, 37-43. (5) Pagilla, K. R.; Jenkins, D.; Kido, W. P. Norcardia control in activated sludge by classifying selectors. Water Environ. Res. 1996, 68, 235-239. (6) Lechevalier, M. P.; Lechevalier, H. A. Nocardia amarae sp. nov., and actinomycetes common in foaming activated sludge. Int. J. Syst. Bacteriol. 1974, 24, 278-288. (7) Blackall, L. L.; Seviour, E. M.; Bradford, D.; Stratton, H. M.; Cunningham, M. A.; Hugenholtz, P.; Seviour, R. J. Towards understanding the taxonomy of some of the filamentous bacteria causing bulking and foaming in activates sludge plants. Water Sci. Technol. 1996, 34, 137-144. (8) Soddell, J. Encyclopedia of Environmental Microbiology; WileyInterscience: Hoboken, NJ, 2002; Vol. 1, pp 1-7. (9) Cha, D. K.; Fuhrmann, J. J.; Kim, D. W.; Golt, C. M. Fatty acid methyl ester (FAME) analysis for monitoring Nocardia levels in activated sludge. Water Res. 1999, 33, 1964-1966. (10) Morisada, S.; Miyata, N.; Iwahori, K. Immunomagnetic separation of scum-forming bacteria using polyclonal antibody that recognizes mycolic acids. J. Microbiol. Methods 2002, 51, 141148. (11) De los Reyes, M. F.; De los Reyes III, F. L.; Hernandez, M.; Raskin, L. Quantification of Gordonia amarae strains in foaming activated sludge and anaerobic digester systems with oligonucleotide hybridization probes. Appl. Environ. Microbiol. 1998, 64, 2503-2512. (12) Jenkins, D.; Richard, M. G.; Daigger, G. T. Manual on the Cause and Controls of Activated Sludge Bulking and Foaming; Lewis Publishers: Boca Raton, FL, 1993. (13) Cole, J. R.; Chai, B.; Marsh, T. L.; Farris, R. J.; Wang, Q.; Kulam, S. A.; Chandra, S.; McGarrell, D. M.; Schmidt, T. M.; Garrity, G. M.; Tiedje, J. M. The Ribosomal Database Project (RDP-II): Previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res. 2003, 31, 442443. (14) Zhou, J.; Thompson, D. K. Challenges in applying microarrays to environmental studies. Curr. Opin. Biotechnol. 2002, 13, 204207. (15) Kitagawa, E.; Momose, Y.; Iwahashi, H. Correlation of the structure agricultural fungicides to gene expression in Saccharomyces cerevisiae upon exposure to toxic doses. Environ. Sci. Technol. 2003, 37, 2788-2793. (16) Dennis, P.; Edwards, E. A.; Liss, S. N.; Fulthorpe, R. Monitoring gene expression in mixed microbial communities by using DNA microarrays. Appl. Environ. Microbiol. 2003, 69, 769-778. (17) Wang, D.; Urisman, A.; Liu, Y. T.; Springer, M.; Ksiazek, T. G.; Erdman, D. D.; Mardis, E. R.; Hickenbotham, M.; Magrini, V.; Eldred, J.; Latreille, J. P.; Wilson, R. K.; Ganem, D.; DeRisi, J. L. Viral discovery and sequence recovery using DNA microarrays. PLOS Biol. 2003, 1, 257-260. (18) Stenger, D. A.; Andreadis, J. D.; Vora, G. J.; Pancrazio, J. J. Potential applications of DNA microarrays in biodefense-related diagnostics. Curr. Opin. Biotechnol. 2002, 13, 208-212. (19) Fukushima, M.; Kakinuma, K.; Hayashi, H.; Nagai, H.; Ito, K.; Kawaguchi, R. Detection and identification of Mycobacterium species isolates by DNA microarray. J. Clin. Microbiol. 2003, 41, 2605-2615. (20) Cho, J. C.; Tiedje, J. M. Bacterial species determination from DNA-DNA hybridization by using genome fragments and DNA microarray. Appl. Environ. Microbiol. 2001, 67, 3677-3682. (21) Bekal, S.; Brousseau, R.; Masson, L.; Prefontaine, G.; Fairbrother, J.; Harel, J. Rapid identification of Escherichia coli pathotypes by virulence gene detection with DNA microarray. J. Clin. Microbiol. 2003, 41, 2113-2125. (22) Su, M. C.; Cha, D. K.; Anderson, P. R. Influence of selector technology on heavy metal removal by activated sludge: secondary effects of selector technology. Water Res. 1995, 29, 971-976. (23) Rodı´guez-Gancedo, M. B.; Rodı´guez-Gonza´lez, T.; Yagu ¨ e, G.; Valero-Guille´n, P. L.; Segovia-Herna´ndez, M. Mycobacterium peregrinum bacteremia in an immunocompromised patient VOL. 38, NO. 24, 2004 / ENVIRONMENTAL SCIENCE & TECHNOLOGY
9
6773
(24) (25)
(26)
(27) (28) (29)
(30)
with a Hickman catheter. Eur. J. Clin. Microbiol. Infect. Dis. 2001, 20, 589-590. Love, C. A.; Lilley, P. E.; Dixon, N. E. Stable high-copy-number bacteriophage λ promoter vector for overproduction of proteins in Eshcerichia coli. Gene 1996, 176, 49-53. Wu, C. F.; Valdes, J. J.; Bentley, W. E.; Sekowski, J. W. DNA microarray for discriminate between O157:H7 EDL933 and nonpathogenic Escherichia coli strains. Biosens. Bioelectron. 2003, 19, 1-8. Small, J.; Call, D. R.; Brockman, F. J.; Straub, T. M.; Chandler, D. P. Direct detection of 16S rRNA in soil extracts by using oligonucleotide microarrays. Appl. Environ. Microbiol. 2001, 67, 4708-4716. Lessie, T. G.; Hendrickson, W.; Manning, B. D.; Devereux, R. Genomic complexity and plasticity of Burkholderia cepacia. FEMS Microbiol. Lett. 1998, 144, 117-128. Sokal, R. R.; Sneath, P. H. Principles of Numerical Taxonomy; W. H. Freeman and Co.: San Francisco, CA, 1963. Purohit, H. J.; Kapley, A.; Moharikar, A. A.; Narde, G. A novel approach for extraction of PCR-compatible DNA from activated sludge samples collected from different biological effluent treatment plants. J. Microbiol. Methods 2003, 52, 315-323. Cho, J. C.; Tiedje, J. M. Quantitative detection of microbial genes using DNA microarrays. Appl. Environ. Microbiol. 2002, 68, 1425-1430.
6774
9
ENVIRONMENTAL SCIENCE & TECHNOLOGY / VOL. 38, NO. 24, 2004
(31) Kafatos, F.; Jones, C. W.; Efstratiadis, A. Determination of nucleic acid sequence homologies and relative concentrations by a dot hybridization procedure. Nucleic Acids Res. 1979, 7, 15411551 (32) Goi, M.; Nishimura, T.; Kuribayashi, S.; Okouchi, T.; Murakami, T. An experimental study to suppress scum formation accompanying the abnormal growth of Nocardia by adding ozone in the aeration tank. Water Sci. Technol. 1994, 30, 231234. (33) Sebat, J. L.; Clowell, F. S.; Crawford, R. L. Metagenomic profiling: microarray analysis of an environmental genomic library. Appl. Environ. Microbiol. 2003, 69, 4927-4934. (34) Greer, C. W.; Whyte, L. G.; Lawrence, J. R.; Masson, L.; Brousseau R. Genomics technologies for environmental science. Environ. Sci. Technol. 2001, 35, 360A-366A. (35) Piper, M. D. W.; Daran-Lapujade, P.; Bro, C.; Regenberg, B.; Knudsen, S.; Nielsen, J.; Pronk, J. T. Reproducibility of oligonucleotide microarray transcriptome analyses An interlaboratory comparison using chemostat cultures of Saccharomyces cerevisiae. J. Biol. Chem. 2002, 277, 37001-37008.
Received for review December 16, 2003. Revised manuscript received August 27, 2004. Accepted September 13, 2004. ES035398O