Multiple and Simultaneous Fluorophore Detection ... - ACS Publications

Dec 14, 2005 - multivariate curve resolution to achieve signal deconvo- lution. The approach calculates a sample-specific confi- dence interval for a ...
2 downloads 0 Views 675KB Size
Anal. Chem. 2006, 78, 513-523

Multiple and Simultaneous Fluorophore Detection Using Fluorescence Spectrometry and Partial Least-Squares Regression with Sample-Specific Confidence Intervals Michael L. Griffiths,* Romina P. Barbagallo, and Jacquie T. Keer

Analytical Technology Division, LGC, Queens Road, Teddington, U.K., TW11 0LY

Fluorescent labeling is widely used in biological and chemical analysis, and the drive for increased throughput is stretching multiplexing capabilities to the limit. The limiting factor in multiplexed analyses is the ability to subsequently deconvolute the signals. Consequently, alternative approaches for interpreting complex data sets are required to allow individual components to be identified. Here we have investigated the application of a novel approach to multiplexed analysis that does not rely on multivariate curve resolution to achieve signal deconvolution. The approach calculates a sample-specific confidence interval for a multivariate (partial least-squares regression (PLSR)) prediction, thereby enabling the estimation of the presence or absence of each fluorophore based on the total spectral signal. This approach could potentially be applied to any multiplexed measurement system and has the advantage over the current algorithmbased methods that the requirement for resolution of spectral peaks is not central to the method. Here, PLSR was used to obtain the concentrations for up to eight dyelabeled oligonucleotides at levels of (0.6-5.3) × 10-6 M. The sample-specific prediction intervals show good discrimination for the presence/absence of seven of the eight labeled oligonucleotides with efficiencies ranging from ∼91 to 100%. The labeling of biological components is necessary for a wide range of measurement and characterization techniques such as DNA hybridization including microarray-based analysis, ELISA, western blotting, separation science, and real-time PCR. There is a constant requirement for an increase in throughput in many areas, including screening in pharmaceutical product discovery, early diagnosis of disease involving detection of biochemical species at extremely low levels, and scanning of multiple genetic loci to assess susceptibility to particular disease conditions. Greater throughput can be achieved in part through development of improved multiplexing capabilities. However, for many tests to be run in parallel in a single reaction, detection mechanisms such as label discrimination need to be improved to ensure that output signals can be accurately interpreted and ascribed. * Corresponding author. Tel: +44 (0)20 8943 7352. Fax: +44 (0)20 8943 2767. E-mail: [email protected]. 10.1021/ac051635p CCC: $33.50 Published on Web 12/14/2005

© 2006 American Chemical Society

Fluorescence is currently one of the most widely used labeling methodologies in biological analysis. The reasons are economy and practicality. However, the main drawback of fluorescent dyes to date is the low multiplexing capability. This is due to the broad peaks of the spectra of most fluorophores, which result in overlapping of signals between dyes. Problems with such spectral interference can be reduced by using high-quality narrow bandpass emission filters to select only signals at specific wavelength and by careful selection of fluorochromes to maximize spectral separation. Even using such precautions, the algorithms commonly used to deconvolute fluorescent signals can only be used with a restricted number of fluorophores simultaneously, and quantitative accuracy is limited. Alternative approaches to increasing the number of multiplexed fluorescent assays include exploitation of physical differences in analytes, such as size in gel electrophoresis1 or HPLC2 separation or sequence composition in melting curve assays.3 Much larger scale increases may be achieved by the use of a physical addressing system, as in the use of ordered arrays for simultaneous screening of low- or highdensity targets.4,5 To improve the level of parallel measurement, and hence throughput, the essential problem is to be able to deconvolute individual signals corresponding to the different analytes present in a multiplexed solution that potentially contains a high number of interfering fluorescence signals. The simplest aim is to determine whether each of the signals is present. Several possible avenues of approach are available that can potentially deal with the problem. Currently, much of the literature6 cites spectral deconvolution, or multivariate curve resolution, as the method of choice when determining the possible constituents within a multiplexed spectrum, essentially the deconvolution of overlapping signals. (1) Uematsu, C.; Nishida, J.; Okano, K.; Miura, F.; Ito, T.; Sakaki, Y.; Kambara, H. Nucleic Acids Res. 2001, 29, E84. (2) Dehainault, C.; Lauge, A.; Caux-Moncoutier, V.; Pages-Berhouet, S.; Doz, F.; Desjardins, L.; Couturier, J.; Gauthier-Villars, M.; Stoppa-Lyonnet, D.; Houdayer, C. Nucleic Acids Res. 2004, 32, e139. (3) Wittwer, C. T.; Herrmann, M. G.; Gundry, C. N.; Elenitoba-Johnson, K. S. Methods 2001, 25, 430-42. (4) Panicker, G.; Call, D. R.; Krug, M. J.; Bej, A. K. Appl. Environ. Microbiol. 2004, 70, 7436-44. (5) Sapolsky, R. J.; Hsie, L.; Berno, A.; Ghandour, G.; Mittmann, M.; Fan, J. B. Genet. Anal. 1999, 14, 187-92. (6) Docherty, F. T.; Monoghan, P. B.; Keir, R.; Graham, D.; Smith, W. E.; Cooper, J. M. Chem. Commun. 2004, 118-9.

Analytical Chemistry, Vol. 78, No. 2, January 15, 2006 513

In this study, we propose a technique based on the use of multivariate regression and the application of sample-specific prediction intervals recently presented by Faber and Bro7,8 and used by Serneels et al. for the successful identification of microorganisms using trilinear partial least sqaures.9 The coupling of multivariate regression with confidence intervals specific to each analyte/sample effectively produces a qualitative classification system allowing the presence or otherwise of an analyte to be determined by comparing x ( PI (the predicted value (x) ( sample-specific prediction interval) with the boundary value of zero. Assessment of method suitability is then possible using reliability measures such as false positive/negative rates, sensitivity, and specificity. There are many regression techniques capable of prediction using more than one independent variable (e.g., instrumental response). One of the most popular multivariate modeling techniques is multiple linear regression (MLR). Unfortunately, MLR fails in practice with spectroscopic data because of the collinearity present in the calibration matrix X; i.e., some of the columns of X are linear combinations of the other columns. Collinearity, or ill conditioning, will make any subsequent model unstable in its predictive ability. Conventional MLR uses all the data in X to obtain a predictive model; hence, large amounts of noise in X are incorporated into any solution, producing subsequently poor predictive results. Although this can be successfully dealt with using stepwise linear regression techniques, the presence of large numbers of regressors still poses a potential problem. The analyst may well be faced with the problem of either accepting a large number of potentially informative regressors or removing informative areas of the spectrum to reduce collinearity. One solution to the problem of collinearity is to reduce high numbers of correlated variables into a much smaller number of uncorrelated variables. This can be performed using data projection methods, an example of which is partial least-squares regression (PLSR). Detailed treatments of PLSR are numerous, including that by Martens and Naes.10 The most widespread approach is often called PLSR1 (referred from this point as PLSR) and is the method used in this paper. In PLSR, each latent variable direction of the X matrix is modified so that covariance is maximized between it and the Y vector (i.e., the prediction of only 1 analyte at a time). The emphasis of this paper is on the application of samplespecific confidence intervals to a qualitative problem: is a given oligonucleotide present? The purpose of the current work is therefore to demonstrate the practical utility of sample-specific confidence intervals in classification problems and to highlight other areas of potential use. EXPERIMENTAL SECTION Sample (Oligonucleotide) Preparation and Instrumentation. The labeled oligonucleotides were purchased from Oswel (Hampshire, U.K.) and were HPLC purified. The oligonucleotides were designed to be complementary to bacterial genomic se(7) Faber, N. M.; Song, X. H.; Hopke, P. K. Trends Anal. Chem. 2003, 22, 330-4. (8) Faber, N. M.; Bro, R. Chemom. Intell. Lab. Syst. 2002, 61, 133-49. (9) Serneels, S.; Moens, M.; Van Espen, P. J. Anal. Chim. Acta 2004, 516, 1-5. (10) Martens, H.; Naes, T. Multivariate Calibration; Wiley: Sussex, U.K., 1991.

514

Analytical Chemistry, Vol. 78, No. 2, January 15, 2006

quences and were each 5′ labeled with a fluorescent dye. The dye molecules used here were chosen as they are all commercially available labels used routinely in fluorescence detection of DNA. HEX oligonucleotide: HEX, T*(CT)5 *CGGGCGCTCATCATAGTCTTTCTTA with sequence complementary to the Escherichia coli eaeA gene encoding the adhesin intimin. TAMRA oligonucleotide: TAMRA, ATAAATCGCCATTCGTTGACTAC complementary to the verotoxin encoding gene VT1.11 Cy3 oligonucleotide: Cy3, T*(CT)5 *CATAAATCGCCATTCGTTGACTAC, also complementary to the VT1 gene.11 ROX oligonucleotide: ROX, GCGTCATCGTATACACAGGAGCAG complementary to a region of the VT2-encoding gene. Cy5 oligonucleotide: Cy5, T*(CT)5 *CGCGTCATCGTATACACAGGAGCAG, also complementary to the same region of the VT2 locus. R6G oligonucleotide: R6G, CCCCACTGCTGCCTCCCGTAG, complementary to a conserved prokaryotic 16S rRNA sequence, modified from Nelson et al.12 FAM oligonucleotide: FAM, T*(CT)5 *CCCCCACTGCTGCCTCCCGTAG, complementary to a conserved prokaryotic 16S rRNA sequence. TET oligonucleotide: TET, T*(CT)5 *CGAAGGTCCCCCTCTTTGGTCTTGC, directed to a conserved enteric bacterial 16S rRNA, with sequence modified from Nelson et al.12 Calibration curves had been previously constructed,13 using the oligonucleotides described above, to determine the linear range of the instrument for each of the dyes. The oligonucleotides were diluted to the final concentrations (described below) using tissue culture grade water (Sigma). The solutions were prepared in siliconized 0.5-mL tubes (Robbins, Qbiogene) to minimize the unspecific attachment of DNA to the walls of the tube, which would reduce the effective concentration of oligonucleotide in solution. The optical 96-well plates (Applied Biosystems, Warrington, U.K.) were sealed with an optical adhesive cover (Applied Biosystems). The well plates were centrifuged for 1 min at 2000 rpm (Jouan MR22i) to ensure that the solution was uniformly distributed at the bottom of the wells. Absolute fluorescence was measured in an ABI Prism 7700 Sequence Detection System instrument (Applied Biosystems). The instrument was set in the “plate read” mode, and the exposure time was set to 150 ms. Blanks were obtained by measuring the empty plate (sealed with an optical lid) before the solutions were added. Experimental Design and Modeling. To ensure that the calibration space is adequately covered for multivariate calibration, an appropriate experimental design is generally required. The experimental design used on this occasion (generated using MODDE 6.0 by Umetrics), a fractional factorial design (resolution V), was able to characterize the calibration space using 67 calibration solutions, each containing a specified amount of each dye-labeled oligonucleotide. The experimental data (X ∼ 67 × 32 matrix, Y ∼ 67 × 8) were modeled (using PLSR) without any form of pretreatment (i.e., mean centring or variance scaling); 67 calibration samples × 32 wavelength windows were collected by the instrument (see Data Collection and Analysis) and a 67 by 8 matrix for Y (where 8 is the number of fluorophores). Each of the 67 calibration samples was left out of the calibration model and predicted using the (11) Paton, A. W.; Paton, J. C. J. Clin. Microbiol. 1998, 36, 598-602. (12) Nelson, B. P.; Liles, M. R.; Frederick, K. B.; Corn, R. M.; Goodman, R. M. Environ. Microbiol. 2002, 4, 735-43. (13) Faulds, K.; Barbagallo, R. P.; Keer, J. T.; Smith, W. E.; Graham, D. Analyst 2004, 129, 567-68.

Table 1. Maximum Emission Wavelength and Number of Latent Variables Used in the PLS1 Model for Each Dye-Labeled Oligonucleotide dye

wavelength (max nm) of dye

no. of latent variables

dye

wavelength (max nm) of dye

no. of latent variables

HEX FAM CY5 CY3

535 494 643 552

11 9 4 7

ROX TAMRA R6G TET

585 565 524 521

6 6 11 7

Table 2. Reliability Measures and R2 Value for Actual versus Predicted Oligonucleotide Concentration

dye

false positive rate (%)

false negative rate (%)

sensitivity (%)

specificity (%)

efficiency (%)

R2 (actual vs predicted concn)

HEX FAM CY5 CY3 ROX TAMRA R6G TET

6.25 0.00 0.00 0.00 0.00 3.13 0.00 0.00

11.43 0.00 0.00 2.86 2.86 0.00 11.43 85.71

88.57 100.00 100.00 97.14 97.14 100.00 88.57 14.29

85.71 91.43 91.43 91.43 91.43 88.57 91.43 91.43

91.04 100.00 100.00 98.51 98.51 98.51 94.03 55.22

0.71 0.87 0.95 0.88 0.91 0.91 0.71 0.54

remaining 66 samples via PLSR. This was repeated until all 67 samples were predicted independently. The number of latent variables used for each PLSR model is shown in Table 1. Calculation of Sample-Specific Uncertainty Estimates. Conceptually, both univariate and multivariate methods require prediction intervals if meaningful results are to be obtained. However, whereas univariate expressions are well characterized, a generalization to multivariate methodology has been required for some time. The calculation of such univariate model output uncertainties is summarized by several standard expressions derived from elementary statistics. Multivariate models, unfortunately, are inherently more complex, and as a result, theoretical advances with respect to corresponding error analysis have been relatively slow and somewhat varied in their different approaches. The most popular multivariate calibration algorithm, PLSR, has received considerable attention in the chemometrics-related literature with respect to the calculation of sample-specific standard error of prediction estimation.8,14-20 The estimation of samplespecific standard error of prediction used in this study is a simplification derived under the error in variables model by Faber and Bro8 and calculated using software written in-house (Matlab version 6.5, release 13). The expression for the estimation of sample-specific standard error of prediction (eq 1) has three components, namely, MSEC (eq 2), h, and V∆y. Respectively, these are the mean squared error of calibration, the sample-specific leverage, and the variance in the reference values using a particular calibration method. For this study, V∆y was set to zero, thereby giving prediction intervals (14) Baffi, G.; Martin, E.; Morris, J. Chemom. Intell. Lab. Syst. 2002, 61, 15165. (15) Denham, M. C. J. Chemom. 1997, 11, 39-52. (16) Faber, N. M. Chemom. Intell. Lab. Syst. 2000, 5, 123-34. (17) Fernandez Pierna, J. A.; Jin, L.; Wahl, F.; Faber, N. M.; Massart, D. L. Chemom. Intell. Lab. Syst. 2003, 65, 281-91. (18) Lopes, J. A.; Menezes, J. C. Chemom. Intell. Lab. Syst. 2003, 68, 75-81. (19) Morsing, T.; Ekman, C. J. Chemom. 1998, 12, 295-9. (20) Phatak, A.; Reilly, P. M.; Penlidis, A. Anal. Chim. Acta 1993, 277, 495501.

Table 3. Studentized Apparent Prediction Residuals with Expected Values of (f/(f - 2))1/2a dye

s(ti)

f/(f - 2)1/2

HEX FAM CY5 CY3 ROX TAMRA R6G TET

1.03 1.05 1.06 1.58 (1.08) 2.02 (1.08) 1.63 (1.05) 1.05 0.96

1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02

a s(t ) results in parentheses indicate values with suspect calibration i samples removed.

with the largest possible coverage. Although future work may indicate more suitable values for V∆y through estimation of the combined measurement error in the reference values, validation of the sample-specific uncertainty estimates proved to be satisfactory (Table 3). Both the MSEC and subsequent confidence interval estimates used degrees of freedom calculated using the method proposed by Van der Voet.21

σPE ≈ [(1 + h)MSEC - V∆y]1/2

(1)

I

∑(yˆ - ˜y )

2

i

MSEC )

i

i)1

I-f

(2)

Here yˆi, ˜yi, I, and f are the predicted and actual calibration values, the total number of calibration samples, and degrees of freedom, with the “tilde” (∼) indicating that the associated quantity is measured. The rigorous study of Van der Voet has clearly established that the conventional number, i.e., a single degree of freedom for each factor, is not correct. A theoretically sound (21) van der Voet, H. J. Chemom. 1999, 13, 195-208.

Analytical Chemistry, Vol. 78, No. 2, January 15, 2006

515

Figure 1. RMSEC and RMSECV plot for CY5.

Figure 2. RMSEC and RMSECV plot for TAMRA.

alternative can be calculated using the results of leave-one-out cross-validation (see eq 26 in ref 21). Validation of Proposed Sample-Specific Uncertainty Estimates. Comparing the coverage probabilities of the prediction intervals with the nominal value would validate the expression used to calculate such intervals. Unfortunately, this requires errorfree reference values. However, it has been shown that an equivalent test (eq 3) follows from the studentised apparent prediction residuals

ti )

yˆi - yref,i σPE

i ) 1,..., I

(3)

These should be approximately distributed as Student’s t with degrees of freedom (f) associated with the MSEC estimate (eq 2). In particular, the standard deviation should be close to (f/(f - 2))1/2.8 Consequently, demanding that the true prediction error be predicted correctly on average amounts to demanding the expected standard deviation to approach (f/(f - 2))1/2, which is easily calculated. Selection of Optimum Rank of PLSR Models. Faber16 found the performance of eq 3 to rely heavily on the ability to correctly estimate the optimum model dimensionality. The study of Denham15,22 puts the same demands on the use of eq 3. The procedure used here in identifying the correct number of latent variables was based upon finding that number that gave either a minimum root-mean-square error of cross-validation (RMSECV) or where the RMSECV reached a plateau, the method advocated by Faber and Bro.8 For example, RMSECV plots for the CY5 (Figure 1) and TAMRA (Figure 2) labeled oligonucleotides show four and six latent variable PLSR models to be appropriate with RMSECV values of ∼0.5 × 10-6 and ∼0.28 × 10-6, respectively. Data Collection and Analysis. The data were obtained as a spectrum divided into 32 bins (corresponding each to ∼4.7 nm) spanning between 500 and 650 nm. Experiments to measure the signals from multiplexed analyses were designed to minimize the number of experiments conducted (22) Denham, M. C. J. Chemom. 2000, 14, 351-61.

516 Analytical Chemistry, Vol. 78, No. 2, January 15, 2006

while obtaining sufficient information for subsequent evaluation of data deconvolution approaches (see Experimental Design and Modeling). The resultant experimental runs each had a specific combination of the 8 dyes in 67 wells, with the fluorophores either absent (0) or present at the following oligonucleotide concentrations: Cy3 at 1.3 × 10-7 M/6.5 × 10-8 M, Cy5 at 5.3 × 10-6 M/2.65 × 10-6 M, FAM at 1.3 × 10-6 M/6.5 × 10-7 M, ROX at 3.8 × 10-7 M/1.9 × 10-7 M, HEX at 1.2 × 10-7 M/6.0 × 10-8 M, TET at 2.7 × 10-7 M/1.35 × 10-7 M, TAMRA at 1.9 × 10-7 M/3.5 × 10-8 M, and R6G at 6.1 × 10-7 M/3.05 × 10-7 M. To complete the design, three center points were measured in which each oligonucleotide was present at half the above concentrations. To obtain these combinations, 5 µL of each solution was added to each well (Applied Biosystems) and the final volume was adjusted to 40 µL with water. The fluorescence obtained from an empty plate was then subtracted from the signals and the data were modeled. RESULTS AND DISCUSSION An example of the individual spectrum of each fluorophore and the overall signal obtained when all eight fluorophores are multiplexed is shown in Figure 3. As can be observed, the peak signals in the spectra of the dyes overlap significantly. The modeling of the data using PLSR has clearly removed a substantial amount of irrelevant information (Table 1). In the case of CY5, only 4 latent variables were required out of a possible 32. Because each latent variable successively describes less and less of the covariance between X and Y, a point is reached where a tradeoff between number of latent variables and remaining model error has to be reached. This point is commonly taken as that at which the root-mean-square error of cross-validation (RMSECV, see Experimental Section) begins to plateau (see Figures 1 and 2). The predictive classification results, in the form of reliability measures, using PLSR are shown in Table 2 (see Appendix for actual, predicted values, and confidence intervals) for the dyelabeled oligonucleotides HEX, FAM, CY5, CY3, ROX, TAMRA, R6G and TET, respectively (for reliability measure definitions, see

Figure 3. Total and individual spectra of eight fluorophores, measured between 500 and 650 nm using the ABI SDS PRISM 7700.

the Appendix). It is evident from the table that the method is correctly predicting the presence of the dye-labeled oligonucleotides in the majority of cases, with efficiencies ranging from 91.04 to 100%. For five out of a possible eight oligonucleotides, the efficiency was >98%, and in two instances (FAM and CY5), the efficiency was 100%. It is evident that a problem exists with TET; in this instance, the efficiency was only ∼55%. Investigations show a relatively large spread in the calibration response values (hence the poor R2 value of only 0.54), due to reproducibility problems with the use of this particular dye label. The relatively high false positive rate for HEX (6.25%) may be the result of contamination, a relatively poor predictive model (R2 ∼ 0.71), or a combination of both. The major problem appears to be in not detecting the presence of a given analyte when it is actually present (false negative rate). It is quite obvious from Table 2 that classification efficiency is related to the quality of the PLS predictive models as shown by the R2 values relating the linear dependence of the actual and predicted oligonucleotide concentrations. Studentized apparent prediction residuals, ti (eq 3), are distributed approximately as Student’s t. In particular, the standard deviation, s(ti), should be close to (f/(f - 2))1/2 for f degrees of freedom (see Experimental Section). A comparison of the values of ti and (f/(f - 2))1/2 for each dye label (Table 3) show that for the majority ti/(f/(f - 2))1/2 ≈ 1, except that is for CY3, ROX, and TAMRA. The 95% confidence intervals for CY3, ROX, and TAMRA (Figure 4a-c, respectively) show the incorrect classification of sample 10, for oligonucleotides labeled with CY3 and ROX, and again suggests a problem with sample makeup. The same explanation is the most likely for sample 62 for TAMRA-labeled oligonucleotides. Removal of those samples believed to have incorrect concentrations reduces the ti/(f/(f - 2))1/2 ratio of these particular oligonucleotides to ∼1. Although not perfect, the empirically observed values, s(ti), are close to the expected values, giving evidence that the proposed confidence interval estimates are providing the appropriate coverage. The ability of the method to determine the presence, or otherwise, of each fluorophore rests with its ability to construct a confidence interval from a multivariate regression model (see Experimental Section). Provided such intervals can be constructed, it is a relatively straightforward task to determine whether a given

fluorophore is present. This ability to effectively interpret signals from a multiplexed analysis has a number of potential applications. Here the oligonucleotide probes have been designed for detection of verocytotoxin-producing E. coli (VTEC), a human pathogen of significant public health concern.23 Tests are required to determine the presence of VTEC and other bacterial pathogens in foods, water, and the environment, and methods enabling simultaneous screening for a range of disease-causing organisms are being developed.24,25 The probe sequences used in this work were directed to a series of loci ranging from conserved prokarotic and enteric bacterial sequences to specific verocytotoxin-producing genes (see Experimental Section). Detection of a positive signal with the universal prokaryotic probe would point to the presence of prokaryotes in the sample under test, while an additional signal with the enteric 16S rRNA probe would indicate the presence of human enteric bacteria. Additional positive signals with the intimin gene probe and the oligonucleotides directed to the VT-producing genes would give further information on the pathogenicity of the bacterial contaminant(s) present in the sample, demonstrating the presence of adhesin26 and verocytotoxin27 producing organisms, respectively. Similar multiplexing strategies, currently using a range of different detection systems, are being developed to test for the presence of bioterrorism agents,28 genetically modified organisms,29,30 and genetic predisposition to disease.31 The use of fluorescent labeling for multiplexed analysis usually relies on the use of algorithms provided with particular equipment (23) Hussein, H. S.; Omaye, S. T. Exp. Biol. Med. (Maywood) 2003, 228, 3312. (24) Selvapandiyan, A.; Stabler, K.; Ansari, N. A.; Kerby, S.; Riemenschneider, J.; Salotra, P.; Duncan, R.; Nakhasi, H. L. J. Mol. Diagn. 2005, 7, 268-75. (25) Abd-El-Haleem, D.; Kheiralla, Z. H.; Zaki, S.; Rushdy, A. A.; Abd-El-Rahiem, W. J. Environ. Monit. 2003, 5, 865-70. (26) Jerse, A. E.; Yu, J.; Tall, B. D.; Kaper, J. B. Proc. Natl. Acad. Sci. U.S.A. 1990, 87, 7839-43. (27) Willshaw, G. A.; Smith, H. R.; Scotland, S. M.; Field, A. M.; Rowe, B. J. Gen. Microbiol. 1987, 133, 1309-17. (28) Varma-Basil, M.; El Hajj, H.; Marras, S. A.; Hazbon, M. H.; Mann, J. M.; Connell, N. D.; Kramer, F. R.; Alland, D. Clin. Chem. 2004, 50, 1060-2. (29) James, D.; Schmidt, A. M.; Wall, E.; Green, M.; Masri, S. J. Agric. Food Chem. 2003, 51, 5829-34. (30) Rudi, K.; Rud, I.; Holck, A. Nucleic Acids Res. 2003, 31, e62. (31) Hogervorst, F. B.; Nederlof, P. M.; Gille, J. J.; McElgunn, C. J.; Grippeling, M.; Pruntel, R.; Regnerus, R.; van Welsem, T.; van Spaendonk, R.; Menko, F. H.; Kluijt, I.; Dommering, C.; Verhoef, S.; Schouten, J. P.; van’t Veer, L. J.; Pals, G. Cancer Res. 2003, 63, 1449-53.

Analytical Chemistry, Vol. 78, No. 2, January 15, 2006

517

Figure 4. Predictive classification results for (a) CY3, (b) ROX, and (c) TAMRA (filled square indicates incorrect classification, open square indicates a correct concentration, horizontal line indicates zero concentration).

for deconvolution of the mixed spectral signals. This approach is limited by the degree of spectral overlap and wide signal peaks of many commercially available fluorophores, as already discussed. The majority of quantitative real-time PCR methods use two or three multiplexed reactions32 with a recent quantitative reverse transcription PCR study successfully utilizing a four-color multiplex.34 A report by Lee et al.33 demonstrated the simultaneous use of seven labels to enable a six-target multiplex to be successfully performed and analyzed, using carefully chosen fluorophores with well-resolved spectra. Despite these improvements, the current capacity of multiplexed analyses is limited by the ability to deconvolute the measured signals. Our approach relies on a novel method for analyzing the mixed spectral signals from a multiplexed assay to determine the presence or absence of each fluorophore, although here the principle has been established using simple mixtures of labeled oligonucleotides. In this study, advances in the theoretical derivation of sample-specific confidence interval estimation for PLSR8 have been exploited to increase the potential number of assays that can be multiplexed in a single measurement.34 (32) Corless, C. E.; Guiver, M.; Borrow, R.; Edwards-Jones, V.; Fox, A. J.; Kaczmarski, E. B. J. Clin. Microbiol. 2001, 39, 1553-8. (33) Lee, L. G.; Livak, K. J.; Mullah, B.; Graham, R. J.; Vinayak, R. S.; Woudenberg, T. M. Biotechniques 1999, 27, 342-9. (34) Persson, K.; Hamby, K.; Ugozzoli, L. A. Anal. Biochem. 2005, 344, 33-42.

518

Analytical Chemistry, Vol. 78, No. 2, January 15, 2006

CONCLUSIONS In this paper, the detection of fluorophores relies on a novel method of analyzing the mixed spectral signals in a multiplexed assay. The principle has been established using relatively simple mixtures of labeled oligonucleotides; however, it is conjectured that it will perform successfully with samples that (i) contain a higher number of analytes, (ii) possess significant spectral overlap, or (iii) both. Pivotal to the method are the recent advances in the theoretical derivation of sample-specific confidence interval estimation for PLSR; where theoretical improvements are made, the detection method will likewise benefit from any such advancement. Although in this study the variance in the reference values (V∆y) was set to zero (giving prediction intervals with the largest possible coverage), future work may indicate more suitable values for V∆y through estimation of the combined measurement error in the reference values. Currently, for real-time PCR, routinely up to two fluorescently labeled reporter molecules and a third passive reference fluorophore are used in each analysis, although greater multiplexing can be achieved with appropriate choice of fluorophore labels and experimental conditions. Economies of time, sample, and reagents are among the significant benefits of performing increasingly multiplexed analyses. The approach presented here for overcoming the practical limitations on multiplexed fluorescent measure-

Table 4. Actual, Predicted Oligonucleotide (HEX Labeled) Concentration Values (M) and 95% PI

Table 5. Actual, Predicted Oligonucleotide (FAM Labeled) Concentration Values (M) and 95% PI

Analytical Chemistry, Vol. 78, No. 2, January 15, 2006

519

Table 6. Actual, Predicted Oligonucleotide (CY5 Labeled) Concentration Values (M) and 95% PI

Table 7. Actual, Predicted Oligonucleotide (CY3 Labeled) Concentration Values (M) and 95% PI

520

Analytical Chemistry, Vol. 78, No. 2, January 15, 2006

Table 8. Actual, Predicted Oligonucleotide (ROX Labeled) Concentration Values (M) and 95% PI

Table 9. Actual, Predicted Oligonucleotide (TAMRA Labeled) Concentration Values (M) and 95% PI

Analytical Chemistry, Vol. 78, No. 2, January 15, 2006

521

Table 10. Actual, Predicted Oligonucleotide (R6G Labeled) Concentration Values (M) and 95% PI

Table 11. Actual, Predicted Oligonucleotide (TET Labeled) Concentration Values (M) and 95% PI

522

Analytical Chemistry, Vol. 78, No. 2, January 15, 2006

Table 12. Reliability Measures in Qualitative Analysis reliability measure

expression

false positive rate false negative rate sensitivity specificity efficiency

FP/(TN + FP) FN/(TP + FN) TP/(TP + FN) TN/(TN + FP) (TP + TN)/(TP + TN + FP + FN)

Table 13. Response Numbers in Qualitative Analysis observeda actual

negative

positive

total

negative positive total

TN FN nN,obs

FP TP nP,obs

nN,a nP,a ntot

a TP, number of true positives; FP, number of false positives; TN, number of true negatives; FN, number of false negatives.

ment has the potential to significantly increase analytical throughput, enabling simultaneous detection of multiple loci with a variety of molecular techniques. Advantageously, the described method may already be utilized for the analysis of direct florescent signals as demonstrated here. Further work to establish the utility of the approach for quantitative analysis will broaden the applicability of the method, potentially allowing the interpretation of highly multiplexed real-time amplification data. APPENDIX Tables 4-11 give actual, predicted values, and their respective 95% confidence intervals. Under the heading “Classification”, “Incorrect” indicates that the predicted value (95% prediction interval (PI) does not include the actual oligonucleotide concentration; “Correct” indicates that the predicted value (95% PI does include the actual oligonucleotide concentration. Reliability measure expressions used in this study are defined as shown in Table 12, where the individual expression elements are defined as shown in Table 13. It is also important to note that the terms “sensitivity” and “specificity” have specific meanings in relation to qualitative data in many fields, including microbiology. The terms used in this paper are therefore defined as (35) Ellison, S. L. R.; Fearn, T. Trends Anal. Chem. 2005, 24, 468-76.

follows:35(a) false positive rate, the probability of obtaining a positive result given that no analyte is present; (b) specificity, the probability of obtaining a negative result given that no analyte is present. Also equal to “1 - (false positive rate)”; (c) false negative rate, the probability of obtaining a negative result given that the analyte is present; (d) sensitivity, for a qualitative test, it is generally taken to mean the probability of obtaining a positive result given that the analyte is present [i.e. “1 - (false negative rate)”]. ACKNOWLEDGMENT The authors would like to thank the UK Department of Trade & Industry (DTI) for funding this work through the Valid Analytical Measurement program (VAM) and Measurements for Biotechnology program (MfB). The authors would also like to express their thanks to Dr. S.L.R. Ellison for his valuable comments throughout this work. Received for review September 13, 2005. Accepted November 16, 2005. AC051635P

Analytical Chemistry, Vol. 78, No. 2, January 15, 2006

523