Anal. Chem. 2008, 80, 9005–9012
Identification of Viruses Using Microfluidic Protein Profiling and Bayesian Classification Julia A. Fruetel,* Jason A. A. West,† Bert J. Debusschere, Kyle Hukari,† Todd W. Lane, Habib N. Najm, Jose Ortega, Ronald F. Renzi, Isaac Shokair, and Victoria A. VanderNoot Sandia National Laboratories, Livermore California 94551-0969 We present a rapid method for the identification of viruses using microfluidic chip gel electrophoresis (CGE) of highcopy number proteins to generate unique protein profiles. Viral proteins are solubilized by heating at 95 °C in borate buffer containing detergent (5 min), then labeled with fluorescamine dye (10 s), and analyzed using the µChemLab CGE system (5 min). Analyses of closely related T2 and T4 bacteriophage demonstrate sufficient assay sensitivity and peak resolution to distinguish the two phage. CGE analyses of four additional viruses—MS2 bacteriophage, Epstein—Barr, respiratory syncytial, and vaccinia viruses—demonstrate reproducible and visually distinct protein profiles. To evaluate the suitability of the method for unique identification of viruses, we employed a Bayesian classification approach. Using a subset of 126 replicate electropherograms of the six viruses and phage for training purposes, successful classification with nontraining data was 66/69 or 95% with no false positives. The classification method is based on a single attribute (elution time), although other attributes such as peak width, peak amplitude, or peak shape could be incorporated and may improve performance further. The encouraging results suggest a rapid and simple way to identify viruses without requiring specialty reagents such as PCR probes and antibodies. Virus isolation in cell cultures has long served as a “gold standard” method for virus identification.1,2 Advantages of this method include the ability to isolate a wide variety of viruses, the sample provides an isolate for additional studies, and increased sensitivity over rapid antigen tests. Disadvantages are the long incubation periods required (days to weeks), technical expertise needed to read and interpret the cytopathic effect, and the cost and maintenance of a variety of cell culture types. Nonculture methods such as antigen detection by immunofluorescence show generally poorer sensitivity compared to cell culture, require expertise to read the results, and are not available for all viruses. Molecular methods such as PCR, although very sensitive, highly specific, and considerably faster than cell culture, suffer from high * To whom correspondence should be addressed. Julia A. Fruetel, Ph.D. Sandia National Laboratories P.O. Box 969 MS 9292 Livermore, CA 94551-0969. Phone: 925-294-2724. Fax: 925-294-3020. E-mail:
[email protected]. † Current address: Arcxis Biotechnologies, 6920 Koll Center Parkway, Suite 215, Pleasanton, CA 94566. (1) Hsiung, G. D. Yale J. Biol. Med 1984, 57, 727–733. (2) Leland, D. S.; Ginocchio, C. C. Clin. Microbiol. Rev. 2007, 20, 49–78. 10.1021/ac801342m CCC: $40.75 2008 American Chemical Society Published on Web 11/04/2008
cost and sensitivity to polymerase inhibitors. Moreover, the higher mutation rates in viruses, especially retroviruses,3 may be problematic and result in false negatives when using dedicated PCR primers.4 Both PCR and viral antigen detection are useful for viruses that do not proliferate in standard cell cultures. A significant drawback to all these methods is that they require specialty reagents that depend on the virus to be detected, such as specialized cell culture lines, antibodies, and PCR primers. Biosensors utilizing surface plasmon resonance and quartz-crystal microbalance have shown promise in detection of viral samples.5 However, these methods require incorporation of recognition elements such as affinity ligands which limit their usefulness in situations where the infectious agent may be unknown, such as early in a disease outbreak or for environmental monitoring of a potential bioterrorist attack. Protein profiling is a technique broadly applicable to characterizing microorganisms and has been described predominantly in the mass spectrometry literature for identifying bacterial and viral proteins,6-10 although capillary electrophoresis methods have also been described.11,12 Using MALDI-TOF, for example, small acidsoluble proteins (SASPs) were found to be useful for identifying five Bacillus species.13 Protein profiling is very appealing for diagnostics, as it is an approach broadly applicable to a variety of organisms, including viruses, and does not require specialty reagents. Currently this approach is both labor and equipment intensive, however, typically requiring 2D gel separation of proteins followed by mass spectrometric analysis of the manually excised protein gel bands. We have developed a microfluidic protein profiling approach using protein solubilization coupled with microfluidic chip gel (3) Svarovskaia, E. S.; Cheslock, S. R.; Zhang, W. H.; Hu, W. S.; Pathak, V. K. Front. Biosci. 2003, 8, D117–D134. (4) Clem, A. L.; Sims, J.; Telang, S.; Eaton, J. W.; Chesney, J. Virol. J. 2007, 4, 65–75. (5) Amano, Y.; Cheng, Q. Anal. Bioanal. Chem. 2005, 381, 156–164. (6) Krishnamurthy, T.; Rajamani, U.; Ross, P. L.; Eng, J.; Davis, M.; Lee, T. D.; Stahl, D. S.; Yates, J. ACS Symp. Ser. 2000, 745, 67–97. (7) Jarman, K. H.; Cebula, S. T.; Saenz, A. J.; Petersen, C. E.; Valentine, N. B.; Kingsley, M. T.; Wahl, K. L. Anal. Chem. 2000, 72, 1217–1223. (8) Ruelle, V.; El Moualij, B.; Zorzi, W.; Ledent, P.; De Pauw, E. Rapid Commun. Mass Spectrom. 2004, 18, 2013–2019. (9) Cooper, B.; Eckert, D.; Andon, N. L.; Yates, J. R.; Haynes, P. A. J. Am. Soc. Mass Spectrom. 2003, 14, 736–741. (10) Kim, Y. J.; Freas, A.; Fenselau, C. Anal. Chem. 2001, 73, 1544–1548. (11) Kustos, I.; Kocsis, B.; Kerepesi, I.; Kilar, F. Electrophoresis 1998, 19, 2317– 2323. (12) Zhang, E.; Carpenter, E.; Puyang, X.; Dovichi, N. J. Electrophoresis 2001, 22, 1127–1132. (13) Hathout, Y.; Setlow, B.; Cabrera-Martinez, R. M.; Fenselau, C.; Setlow, P. Appl. Environ. Microbiol. 2003, 69, 1100–1107.
Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
9005
electrophoresis (CGE) and have demonstrated it for vegetative bacteria and spores. From the peak migration times and shapes for high-copy number proteins, we can broadly distinguish bacterial species from one another14 and spores from vegetative cells using an automated platform.15 This method is rapid (5 as evidence in favor of one agent versus another. RESULTS AND DISCUSSION Molecular Weight Calibration Using One- and Two-Color Detection. Prior to employing two-color detection, the one-color system was used to generate a calibration curve of molecular weight versus migration time for CGE using the µChemLab instrument. The following set of proteins was used in addition to the dye HPTS (0.3 kDa): CCK peptide (1.1 kDa), R-lactalbumin (14.2 kDa), carbonic anhydrase (29.5 kDa), ovalbumin (45 kDa), bovine serum albumin (65 kDa), and IgG (150 kDa). Seven separate measurements, each containing the dye and the six protein standards, were used to generate the calibration curve. The results and an empirical fit of molecular weight versus migration time are shown in Figure 2. For these measurements, the device was operated in constant current mode and the resulting migration times were linearly corrected using the peaks corresponding to HPTS and IgG as standards. The fit shown, although completely empirical, can be used to infer molecular weights of proteins in complex mixtures. The fit in the figure is given by
{
( τt ) + C( τt ) }
MW ) [1 + tanh(R(t - T0))] A + B
2
(2)
where t is the migration time in seconds, and the coefficients are (27) Jeffreys, H.; Theory of Probability, 3rd ed.; Oxford University Press: Oxford, U.K., 1998, p 432.
9008
Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
Figure 2. Calibration curve of MW as a function of migration time for one-color detection system running at normal parameters in constant current mode. The results of seven separate CGE runs where each sample contains six proteins and HPTS are shown. Note that both HPTS (fluorescent dye) and CCK (peptide, 1100 MW) are too small to be within the fractionation range of the Beckman gel and migrate primarily by zone electrophoresis.
R ) 0.015 s-1, T0 ) 170 s, A ) 46 kDa, B ) -146 kDa, C ) 117 kDa, and τ ) 218.85 s. The coefficients for the fit are obtained by least-squares minimization of the residual from fitting the above analytic form to the measurements (total of 49 points used to fit the 5 free parameters; note that τ is used as a normalization parameter and can be absorbed into the coefficients of the polynomial and thus is not a free parameter). All values were weighted equally. The function in front of the quadratic in the above equation accounts for the very weak dependence of molecular weight on migration times for species whose molecular weights are below the gel sieving range (i.e., 100 Copies Per Phage) from Sequence Data and As Measured by SDS-PAGE and CGE Analysis
Figure 3. Comparison of viral protein analysis using SDS-PAGE and CGE. (A) SDS-PAGE of molecular weight standards (lane 1), T2 phage (lane 2), and T4 phage (lane 3). (B) CGE of T2 and T4 phage. The labeled peaks in the T4 electropherogram are (1) gp soc, (2) ipIII, (3) gp hoc, (4) gp23*, and (5) gp18. The peaks labeled with asterisks (/) are internal reference peaks corresponding to HPTS (126 s) and IgG (312 s).
weights for the major viral protein peaks to confirm peak assignments. CGE and SDS-PAGE of T-Even Phage. The T-even bacteriophage, although considered to be distinct viral species, are quite closely related. T2 and T4 are morphologically indistinguishable by electron microscopy. Heteroduplex analysis has shown that there is greater than 85% homology between the genomes of all the T-even phage. Homologous genes in the T-even phage display greater than 95% sequence conservation. In many cases antiserum raised against T4 has shown cross reactivity with T2.28 The classical methods for discriminating between T2 and T4 are based on differences in bacterial receptors they target or the ability of one phage to exclude the other during infection of a host. These techniques take days to accomplish. The structure of T4 has been studied for decades, and the copy number of the majority of its structural proteins, including all major proteins, has been determined.29-31 This feature in conjunction with the high degree of similarity between T-even phage make them an attractive and challenging test of our microfluidic approach to identify viral species. SDS-PAGE analysis of T-even preparations is shown in Figure 3A. The phage demonstrate ∼20 distinct bands each that are detectable by Coomassie-blue staining. The molecular weights of these proteins range from 6 to 150 kDa. The most prominent band (28) Schwarz, H.; Riede, I.; Sonntag, I.; Henning, U EMBO J. 1983, 2, 375– 380. (29) Leimana, P. G.; Kanamarua, S.; Mesyanzhinov, V. V.; Arisaka, F.; Rossmanna, M. G. Cell. Mol. Life Sci. 2003, 60, 2356–2370. (30) Olson, N. H.; Gingery, M.; Eiserling, F. A.; Baker, T. S. Virology 2001, 279, 385–391. (31) Coombs, D. H.; Arisaka, F. In Molecular Biology of Bacteriophage T4; Karam, J. D., Ed.; American Society for Microbiology; Washington, DC, 1994; pp 259-281.
T4 protein
copy no.
MW (kDa)
MW observed by SDS-PAGE (kDa)
MW observed by CGE (kDa)
ipI gp soc ipII gp19 ipIII gp hoc gp23* gp18
360 840 360 144 370 160 960 144
8.5 9.1 9.9 18.5 20.4 40.4 48.7 71.3
ND 10 11 ND 22.5 42.5 51 ND
ND 15.9 ND ND 23.1 49.9 58.7 77.6
at 47 kDa corresponds to GP23*, the major component of the viral capsid that encapsulates the viral DNA (the star form of the protein indicates that is has been proteolytically cleaved during the maturation of the phage). T4 phage also shows prominent bands at ∼9 and 25 kDa, while T2 shows a prominent band at 17 kDa that is not observed for T4. CGE analysis of T2 and T4 phage using the µChemLab instrument also indicate ∼20 peaks of varying intensity for each phage (Figure 3B). The largest peaks roughly correlate with the major bands observed by SDS-PAGE; however, because of the different labeling methods, direct correlation is not observed. A summary of the major T4 proteins and their molecular weights as measured by SDS-PAGE and CGE is shown in Table 1. Molecular weight assignments using CGE are consistently higher than suggested by sequence information; this is also true to a lesser extent using SDS-PAGE. As viral proteins lack posttranslational modification, a reasonable assumption for the increase is covalent binding of fluorescent dye. The labeling protocol uses a >1000-fold excess of fluorescamine dye (MW 300) and is expected to drive labeling of lysine residues of denatured proteins to near completion.32 The additional mass as measured by CGE, between 3 and 10 kDa, is reasonably correlated with the number of lysine residues in the proteins. The higher molecular weight proteins typically have more lysines, and thus the discrepancy between the measured and actual weights is expected to be larger for these proteins (up to ∼10 kDa). Although this represents a significant error when attempting to back-calculate molecular weights from the CGE data, it is not a problem for the protein profiling approach. Further studies that compare protein standards labeled using the fluorescamine procedure with their Alexa-labeled counterparts may indicate the extent to which this discrepancy is attributable to the labeling. Although the CGE protein profiles for T2 and T4 phage are very similar, differences are also evident. We found using classical least-squares (CLS) analysis that these differences are statistically significant. Each plot was tested using the electropherogram along with the CLS fit using one hypothesis (either T2 or T4 assumed present). For the T2 measurement, the relative residual using the T4 signature compared to using the T2 signature is 5.3. This is a rather large value given the apparent high similarity between these two viruses and indicates the high degree of distinguishability (32) Fruetel, J. A.; Renzi, R. F.; VanderNoot, V. A.; Stamps, J.; Horn, B.; West, J.; Crocker, R.; Wiedenman, B.; Choi, W.; Shokair, I.; Ferko, S.; Yee, D.; Padgen, D. Electrophoresis 2005, 26, 1144–1154.
Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
9009
Figure 4. Comparison of viral protein analyses using CGE. Representative CGE traces of a series of viruses showing visually distinctive protein signatures for (A) MS2, (B) RSV, (C) EBV, and (D) vaccinia.
attained using the CGE method. Similarly, for the T4 measurement, the relative residual using the T2 signature compared to the T4 signature is 6.9, which again is a statistically significant difference. CGE of MS2, RSV, EBV, and Vaccinia. To explore the general applicability of this method for identifying viruses based on protein profiles, CGE analysis of four additional viruses was undertaken. MS2, an icosahedral RNA bacteriophage 27-34 nm in diameter, is the simplest of the six viruses and phage studied. The MS2 capsid is formed by 180 copies of the coat protein (MW 13.7 kDa).33,34 The virion also contains one copy of the maturation protein “A” (MW 44 kDa). The CGE protein profile (Figure 4A) shows primarily one protein which coelutes with red-labeled lactalbumin internal standard (MW 13 kDa), indicating it corresponds to the major capsid coat protein. Additional lower molecular weight peaks are also evident; this may reflect impurities in the phage preparation and/or some degradation of the phage sample. RSV, a member of the paramyxoviridae family, is spherical in shape, with six major proteins composing the virion: three proteins associated with helical RNA in the nucleocapsid core, two envelope glycoproteins, and a matrix protein.35-37 The CGE protein profile for RSV (Figure 4B) shows a more complex pattern of peaks, consistent with the more complex protein composition of RSV as compared to MS2. (33) Valegard, K.; Liljas, L.; Fridborg, K.; Unge, T. Nature 1990, 345, 36–41. (34) Golmohammadi, R. K.; Valegard Fridborg, K.; Liljas, L. J. Mol. Biol. 1993, 234, 620–639. (35) Stott, E. J.; Taylor, G. Arch. Virol. 1985, 84, 1–52. (36) Wunner, W. H.; Pringle, C. R. Virology 1976, 73, 228–243. (37) Bachi, T.; Hower, C. J. Virol. 1973, 12, 1173–1180.
9010
Analytical Chemistry, Vol. 80, No. 23, December 1, 2008
EBV, a member of the herpesviridae family, contains a toroidal shaped core (DNA around protein) surrounded by a capsid, the tegument (a protein-filled region), and an envelope containing numerous glycoproteins.38 Herpes viruses are complex and contain ∼35 virion proteins. The CGE profile (Figure 4C) shows many resolved and partially resolved protein peaks. Vaccinia virus, of the poxviridae family, is the most complex virus analyzed in this study. It contains many proteins (more than 100), and the detailed structure is not known. The outer surface is composed of lipid and protein and surrounds the core, containing a tightly compressed nucleoprotein, and two “lateral bodies” (function unknown). There are at least 10 enzymes present in the particle, mostly concerned with nucleic acid metabolism/ genome replication.39 The CGE profile (Figure 4D) is quite complex, showing many unresolved protein peaks of comparable amplitude. The large number of vaccinia proteins is likely not fully resolved with this method. Each virus demonstrates visually distinct CGE protein profiles. Parameters which define the differences include the number of peaks, peak migration times, relative differences in migration times, and relative peak intensities. In order to test whether such profiles are sufficiently unique to distinguish one virus from another, multiple analyses of each virus were performed using the µChemLab instrument connected to an autosampler. The data from these runs were analyzed for reproducibility as well as for training and testing data using Bayesian classification. Run-to-Run Variability Correction. To correct for run-torun variability in the elution times of the peaks in the electropherograms, the signals can be mapped to a reference time trace. Given the locations of the known peaks in the calibration signal (identified using the analyte classification approach described in the Experimental Section), a reference mapping is obtained by fitting a relationship between the locations of the observed standards peaks and the locations of those peaks in the reference signal. If the reference signal is chosen as the average over a number of representative runs, then the deviations between the peak locations in the observed and reference signal are small enough to be corrected with a linear mapping. Once the peak locations are mapped to a reference time trace, it is straightforward to map them to their corresponding molecular weight with the cubic relationship described above. The application of the reference mapping to 126 runs of the calibration signal is shown in Figure 5A and to 18 runs of T4 (10 largest peaks) in Figure 5B. It is clear that the linear mapping provides an adequate correction for the elution time variability. Analyte Classification. The analyte classification algorithm was applied to a set of 126 runs of the following viruses: MS2 (41 runs), RSV (14 runs), vaccinia (29 runs), EBV (13 runs), T2 (11 runs), and T4 (18 runs). Of these 126 runs, 57 were used as training data: MS2, 17/41; RSV, 5/14; vaccinia, 12/29; EBV, 8/13; T2, 6/11; and T4, 9/18. For each virus, four to five distinguishing protein peaks were identified and the distance between them characterized from the training data. Figure 3B shows an example of the peaks observed in the elution spectra of T2 and T4. For (38) Johannsen, E.; Luftig, M.; Chase, M. R.; Weicksel, S.; Cahir-McFarland, E.; Illanes, D.; Sarracino, D.; Kieff, E. Proc. Natl. Acad. Sci. U.S.A. 2004, 101, 16286–16291. (39) Condit, R. C.; Moussatche, N.; Traktman, P. Adv. Virus Res. 2006, 66, 31–111.
Figure 5. (A) Elution times of peaks in the calibration signal of 126 runs, before (left) and after (right) time correction using a linear mapping to a reference signal. Each symbol/color combination corresponds to a different run. (B) Elution times of the 10 largest peaks in 18 T4 runs before (left) and after (right) time correction with the linear mapping inferred from the standards peaks. Table 2. Performance of the Classifier on Nontraining Samples of T2, T4, Vaccinia, MS2, EB, and RSV in Terms of the Number of Runs Where the Bayes Factor >5, Which Is Considered Decisive Evidence of the Agent Being Present classification results: no. of runs, not used for training
virus
total number of runs
no. of runs used as training data
Bayes factor > 5
Bayes factor