Systematic Comparison of Bias and Precision Data Obtained with

May 23, 2007 - tion by comparing bias and precision data obtained with full and one-point ... mance in comparison to other calibration approaches. The...
0 downloads 0 Views 169KB Size
Anal. Chem. 2007, 79, 4967-4976

Systematic Comparison of Bias and Precision Data Obtained with Multiple-Point and One-Point Calibration in Six Validated Multianalyte Assays for Quantification of Drugs in Human Plasma Frank T. Peters* and Hans H. Maurer

Department of Experimental and Clinical Toxicology, Institute of Experimental and Clinical Pharmacology and Toxicology, Saarland University, Building 46, D-66421 Homburg (Saar), Germany

One-point (linear through zero) calibration is often used as a compromise between necessary calibration, workload, and time. The aim of the present study was to systematically check the applicability of one-point calibration by comparing bias and precision data obtained with full and one-point calibration. Data from validation studies of six mass spectrometry-based multianalyte bioanalytical assays were used for this purpose. Bias and intermediate precision datasets of full multiple-point calibration were compared to six one-point calibration datasets (A-F in rising calibrator concentration order) calculated from the same raw data. The datasets were statistically compared using the Friedman test followed by Dunn’s multiple comparison test. The results obtained with full calibration and the different one-point calibrations were found to differ significantly (P < 0.05) in all of the six studied methods. The best one-point calibration results were obtained with calibrator D, with which acceptance criteria for bias and precision were fulfilled for the majority of analytes. However, some extremely high bias and precision data were obtained for some analytes in the lowconcentration range. In conclusion, one-point calibration with a calibrator close to the center of the full calibration range can be a feasible alternative to full calibration. One-point calibration (linear through zero) is often used in routine chemical analysis because of its efficiency with respect to time, workload, and resources. Theoretically, it is applicable when the concentration-response function is linear and the y-intercept is negligibly small. One-point calibration has been described in various areas of analytical chemistry such as analytical toxicology,1-5 therapeutic drug monitoring (TDM),6-8 * To whom correspondence should be addressed. Phone: +49-6841-16-26430. Fax: +49-6841-16-26051. E-mail: [email protected]. (1) Whiting, T. C.; Liu, R. H.; Chang, W. T.; Bodapati, M. R. J. Anal. Toxicol. 2001, 25, 179-189. (2) Tsai, Y. Y.; Tai, S. J.; Huang, M. C.; Chang, B. L.; Liao, C. H.; Liu, R. H. J. Food Drug Anal. 1999, 7, 177-184. (3) Rasanen, I.; Kontinen, I.; Nokua, J.; Ojanpera, I.; Vuori, E. J. Chromatogr., B: Anal. Technol. Biomed. Life Sci. 2003, 788, 243-250. (4) Singh, J.; Elberling, J. A.; Hemphill, D. G.; Holmstrom, J. J. Anal. Toxicol. 1999, 23, 137-140. 10.1021/ac070054s CCC: $37.00 Published on Web 05/23/2007

© 2007 American Chemical Society

clinical chemistry,9,10 pharmaceutical analysis,11-13 and others.14-18 Considering its widespread use, surprisingly few studies have been dedicated to systematic evaluation of its applicability and performance in comparison to other calibration approaches. Theoretical aspects of one-point calibration including potential errors associated with it were extensively discussed by Kemp.19 In practical applications, different strategies have been used for one-point calibration. The most common approach was measuring the response of a single calibrator and calculating the concentrations of unknown samples via their responses in relation to that of the calibrator.1,2,4-8,11-13,15 Another strategy was very similar but involved adjusting the absolute amount of analyte in the sample to closely match that in the calibrator by varying the amount of sample analyzed.9,10,14 A third strategy involved transformation of exponential concentration-response functions to linear through zero functions allowing one-point calibration.16 Generally, the bias and/or precision data obtained with these strategies were acceptable.1,2,4-12,14,16,18 Studies comparing one-point calibration to other calibration approaches yielded contradictory results. Lyytikainen and Pellinen found three-point linear calibration was superior to one-point calibration in a method for determination of chlorinated phenolics in water, but they had used a calibration (5) Liu, R. H.; McKeehan, A. M.; Edwards, C.; Foster, G.; Bensley, W. D.; Langner, J. G.; Walia, A. S. J. Forensic Sci. 1994, 39, 1504-1514. (6) Taylor, P. J.; Forrest, K. K.; Salm, P.; Pillans, P. I. Ther. Drug Monit. 2001, 23, 726-727. (7) Taylor, P. J.; Hogan, N. S.; Lynch, S. V.; Johnson, A. G.; Pond, S. M. Clin. Chem. 1997, 43, 2189-2190. (8) Wang, S.; Magill, J. E.; Vicente, F. B. Arch. Pathol. Lab. Med. 2005, 129, 661-665. (9) Thienpont, L. M.; Van Nuwenborg, J. E.; Reinauer, H.; Stockl, D. Clin. Biochem. 1996, 29, 501-508. (10) Thienpont, L. M.; Van Nuwenborg, J. E.; Stockl, D. J. Chromatogr., A 1995, 706, 443-450. (11) Zhou, W.; Liu, W.; An, D. J. Chromatogr. 1992, 589, 358-361. (12) Walters, D. L.; Strong, C. R.; Green, S. V.; Curtis, M. A. J. Chromatogr., B: Biomed. Sci. Appl. 1995, 670, 299-307. (13) Leito, S.; Molder, K.; Kunnapas, A.; Herodes, K.; Leito, I. J. Chromatogr., A 2006, 1121, 55-63. (14) Preu, M.; Petz, M. Analyst 1998, 123, 2785-2788. (15) Lyytikainen, M.; Pellinen, J. Toxicol. Environ. Chem. 1997, 63, 185-197. (16) Kimball, B. A.; Arjo, W. M.; Johnston, J. J. J. Liq. Chromatogr. Relat. Technol. 2004, 27, 1835-1848. (17) Merson, S.; Evans, P. J. Anal. At. Spectrom. 2003, 18, 372-375. (18) Jagner, D.; Renman, L.; Stefansdottir, S. H. Anal. Chim. Acta 1993, 281, 305-314. (19) Kemp, G. J. Clin. Chem. 1984, 30, 1163-1167.

Analytical Chemistry, Vol. 79, No. 13, July 1, 2007 4967

Table 1. Analytes, Internal Standards, Sample Workup, Analytical Systems, and References of Assays I-VIa assay no.

analytes

internal standard

workup

apparatus

stationary phase

mobile phase

I

diazepam methohexital midazolam nordiazepam pentobarbital phenobarbital propofol thiopental

diazepam-d5 LLE AT HP 6890 series GC HP-1 helium methohexital-d5 (butyl acetate) AT HP 5973 series MSD (12 m × 0.2 mm, 330 nm) (0.6 mL/min) nordiazepam-d5 pentobarbital-d5 phenobarbital-d5

II

R-MDA S-MDA R-MDEA S-MDEA R-MDMA S-MDMA

R-MDA-d5 S-MDA-d5 R-MDMA-d5 S-MDMA-d5

SPE AT HP 6890 series GC HP-5MS helium (confirm HCX) AT HP 5973 series MSD (30 m × 0.25 mm, 250 nm) (1.0 mL/min)

AM BDB BZP EA HO-AM MA MBDB mCPP MDA MDBP MDEA MDMA MeOPP MTA pholedrine PMA PMMA TFMPP

AM-d5 MA-d5 MDA-d5 MDEA-d5 MDMA-d5 pTP

SPE AT HP 5972 series MSD HP-5MS helium (confirm HCX) (30 m × 0.25 mm, 250 nm) (0.6 mL/min)

III

detection EI MS SIM

ref 20

NICI MS 21 (methane) SIM

derivatization with HFBPCl

EI MS SIM

22

derivatization with HFBA

IV

amisulpride trimipramine-d3 bromperidol clozapine norclozapine clozapine-N-oxide droperidol flupenthixol fluphenazine haloperidol melperone olanzapine perazine pimozide risperidone hydroxyrisperidone sulpiride zotepine zuclopenthixol

SPE AT 1100 (confirm HCX) LC/MS system

Merck LiChroCART Superspher RP Select (125 × 2 mm)

gradient ammonium formate (5mM, pH 3) acetonitrile

APCI MS SIM

23

V

acebutolol diacetolol alprenolol atenolol betaxolol bisoprolol bupranolol carazolol carteolol carvedilol celiprolol esmolol labetalol metoprolol nadolol nebivolol oxprenolol penbutolol propranolol sotalol talinolol timolol

SPE AT 1100 LC/MS (confirm HCX) system

Merck LiChroCART Superspher RP Select (125 × 2 mm)

gradient ammonium APCI MS formate (5 mM, pH 3) SIM acetonitrile SIM

24

4968

trimipramine-d3

Analytical Chemistry, Vol. 79, No. 13, July 1, 2007

Table 1 (Continued) assay no. VI

analytes alprazolam bromazepam brotizolam camazepam chlordiazepoxide clobazam clonazepam diazepam flumazenil flunitrazepam flurazepam desalkylflurazepam lorazepam lormetazepam medazepam metaclazepam midazolam nitrazepam nordiazepam oxazepam prazepam, temazepam tetrazepam triazolam zaleplone zolpidem zopiclone

internal standard

workup

diazepam-d5 flunitrazepam-d7 nordiazepam-d5 trimipramine-d3

LLE (diethyl ether/ ethyl acetate, 1:1 v/v)

apparatus AT 1100 LC/MS system

stationary phase

mobile phase

Merck LiChroCART Superspher RP Select (125 × 2 mm)

gradient ammonium formate (5mM, pH 3) acetonitrile

detection

ref

APCI MS SIM

25

a Abbreviations: MDA, 3,4-methylenedioxyamphetamine; MDEA, 3,4-methylenedioxyethylamphetamine; MDMA, 3,4-methylenedioxymethamphetamine; AM, amphetamine; BDB, 1-(1′,3′-benzodioxol-5′-yl)-2-butanamine; BZP, 1-benzylpiperazine; EA, ethylamphetamine; HO-AM, 4-hydroxyamphetamine; MA, methamphetamine; MBDB, N-methyl-1-(1′,3′-benzodioxol-5′-yl)-2-butanamine; mCPP, 1-(3-chloro-phenyl)-piperazine; MDBP, 1-(3,4-methylenedioxybenzyl)-piperazine; MeOPP, 1-(4-methoxyphenyl)-piperazine; MTA, 4-methylthio-amphetamine; PMA, 4-methoxyamphetamine; PMMA, 4-methoxymethamphetamine; TFMPP, 1-(3-trifluoromethylphenyl)-piperazine; pTP, p-tolylpiperazine; LLE, liquid-liquid extraction; SPE, solid-phase extraction; HFBA, heptafluorobutyric anhydride; HFBPCl, heptafluorobutyrylprolyl chloride; AT, Agilent Technologies; GC, gas chromatograph; MSD, mass selective detector; EI, electron ionization; MS, mass spectrometry; SIM, selected ion monitoring; NICI, negative-ion chemical ionization; APCI, atmospheric pressure chemical ionization.

range spanning 3 orders of magnitude and gas chromatography (GC) with electron capture detection (ECD). They attributed their findings to the relatively narrow linear response range of ECD.15 Preu and Petz found one-point calibration to be superior to linear bracketing in an isotope dilution GC-mass spectrometry (MS) method for benzylpenicillin in bovine muscle.14 In two liquid chromatography-tandem MS (LC/MS/MS) methods for determination of the immunosuppressant drugs tacrolimus7 and sirolimus6 in plasma samples, Taylor and co-workers found no relevant differences in bias and precision data obtained with multiple- and one-point calibration. However, they were able to reduce the time required for analysis of a batch of samples considerably by using one-point calibration. Thienpont et al.10 and Zhou et al.11 reported similar findings for methods for determination of electrolytes in serum and Ringer’s injection solution, respectively. Although bias and precision data showed no relevant differences between multiple and one-point calibration, the latter proved to be more practical. Whiting et al.1 and Liu et al.5 compared one-point calibration linear and hyperbolic multiple-point calibration in barbiturate analysis with a special focus on the effects crosscontribution from stable-isotope-labeled standards. They found that in the absence of cross-contribution one-point calibration was applicable and effective. Finally, in a very recent publication, Leito et al.13 compared one-point calibration and multiple-point calibration with respect to their contribution to uncertainty budgets in

LC analysis of pharmaceutical products. They found that the results for both calibration types agreed very well with only slightly better values for multiple-point calibration. The above-mentioned publications clearly demonstrate that one-point calibration can be a feasible alternative to multiple-point calibration in single-analyte procedures, but its usefulness in multianalyte procedures has not been addressed so far. However, multianalyte procedures are very versatile in certain situations. When several related compounds have to be analyzed in the same sample, it must only be analyzed once, if the used method covers all relevant compounds. Furthermore, the number of methods established in a certain laboratory can be reduced, when multianalyte procedures are used that cover a number of relevant analytes of which one or more may be present in samples to be analyzed. In the present paper, the feasibility of one-point calibration in multianalyte procedures will be evaluated by retrospective analysis of validation data from six validated bioanalytical multianalyte procedures. On the basis of the presented findings, a procedure for choosing optimum one-point calibrators in such methods will be proposed. EXPERIMENTAL SECTION Data from the validation studies of six GC/MS- or LC/MSbased multianalyte assays for quantification of the following drugs/ drug classes in human plasma were used in this study: 8 drugs Analytical Chemistry, Vol. 79, No. 13, July 1, 2007

4969

Figure 1. Box and whisker plots of bias (upper panels) and intermediate precision data (lower panels) for assay I as obtained with full and one-point calibrations A-F at low (left panels), medium (middle panels), and high (right panels) QC concentrations. The whiskers represent the lower and upper extreme value, the boxes the interquartile ranges, and the lines within the box the median values (n ) 8). The dotted lines indicate the acceptance limits for bias and precision at the respective concentration level. Asterisks indicate statistical significance in comparison to full calibration as obtained by Dunn’s multiple comparison test: one asterisk, P < 0.05; two asterisks, P < 0.01; three asterisks, P < 0.001.

relevant in brain death diagnosis (assay I),20 enantiomers of 3 amphetamine-derived designer drugs (assay II),21 18 amphetamine- and piperazine-derived designer drugs (assay III),22 15 neuroleptics and 3 of their metabolites (assay IV),23 22 β-blockers (assay V),24 and finally 23 benzodiazepines, flumazenil, zaleplone, zolpidem, and zopiclone (assay VI).25 Key information about these assays is given in Table 1. In the bias and precision experiments of the validation studies of these six assays, quality control (QC) samples containing the analytes at low, medium, and high concentrations had been measured in duplicate20-22,24,25 or triplicate23 on each of 8 days together with daily six-20,21,23-25 or seven-point22 calibration curves. The daily QC data had been calculated via the respective full calibration curves. From these data, the bias data had been estimated as the percent deviation of the mean calculated concentrations at each concentration level from the respective nominal concentrations. The intermediate precision data had been estimated from the same QC data using the method of one-way analysis of variance (ANOVA) with day as the grouping variable.26-28 These bias and intermediate precision data were termed “full (20) Peters, F. T.; Jung, J.; Kraemer, T.; Maurer, H. H. Ther. Drug Monit. 2005, 27, 334-344. (21) Peters, F. T.; Samyn, N.; Lamers, C.; Riedel, W.; Kraemer, T.; de Boeck, G.; Maurer, H. H. Clin. Chem. 2005, 51, 1811-1822. (22) Peters, F. T.; Schaefer, S.; Staack, R. F.; Kraemer, T.; Maurer, H. H. J. Mass Spectrom. 2003, 38, 659-676. (23) Kratzsch, C.; Tenberken, O.; Peters, F. T.; Weber, A. A.; Kraemer, T.; Maurer, H. H. J. Mass Spectrom. 2004, 39, 856-872. (24) Maurer, H. H.; Tenberken, O.; Kratzsch, C.; Weber, A. A.; Peters, F. T. J. Chromatogr., A 2004, 1058, 169-181. (25) Kratzsch, C.; Weber, A. A.; Peters, F. T.; Kraemer, T.; Maurer, H. H. J. Mass Spectrom. 2003, 38, 283-295.

4970 Analytical Chemistry, Vol. 79, No. 13, July 1, 2007

calibration data” and systematically compared to the respective one-point calibration data estimated in the present study as described below. The one-point calibration data were estimated using the same raw data used for the full calibration data. For this purpose, six more or less equidistant calibrators of the daily calibration curves were used as individual one-point calibrators termed A to F in rising concentration order (second lowest calibrator of assay III excluded to reduce the number of one-point calibrators from seven to six). Then, daily QC data were calculated using the respective one-point calibrator A and linear through zero calibration. The same was repeated for calibrators B through F. For each of the resulting datasets, bias and intermediate precision were estimated for each QC concentration level as described above for full calibration. The analytes 3,4-methylenedioxybenzylpiperazine (MDBP) from assay III and bromazepam from assay VI were excluded from the present study, because they had failed to meet the acceptance criteria for bias and intermediate precision even with full calibration.22,25 The obtained bias and intermediate precision data were compared to the acceptance criteria established for bioanalytical methods, i.e., (15% for bias [(20% near the lower limit of quantification (LOQ)] and coefficient of variation (CV) e 15% for (26) Peters, F. T. In Applications of Liquid Chromatography-Mass Spectrometry in Toxicology, 1st ed.; Polettini, A., Ed.; Pharmaceutical Press: London, 2006; Chapter 4. (27) Massart, D. L.; Vandeginste, B. G. M.; Buydens, L. M. C.; De Jong, S.; Lewi, P. J.; Smeyers-Verbeke, J. In Handbook of Chemometrics and Qualimetrics: Part A, 1st ed.; Vandeginste, B. G. M., Rutan, S. C., Eds.; Elsevier: Amsterdam, 1997; Chapter 13. (28) Krouwer, J. S.; Rabinowitz, R. Clin. Chem. 1984, 30, 290-292.

Figure 2. Box and whisker plots of bias (upper panels) and intermediate precision data (lower panels) for assay II as obtained with full and one-point calibrations A-F at low (left panels), medium (middle panels), and high (right panels) QC concentrations. The whiskers represent the lower and upper extreme value, the boxes the interquartile ranges, and the lines within the box the median values (n ) 6). The dotted lines indicate the acceptance limits for bias and precision at the respective concentration level. Asterisks indicate statistical significance in comparison to full calibration as obtained by Dunn’s multiple comparison test: one asterisk, P < 0.05; two asterisks, P < 0.01; three asterisks, P < 0.001.

intermediate precision (e20% near LOQ).29,30 Furthermore, the bias and intermediate precision data at each QC concentration level as obtained with the six one-point calibrators were statistically compared to the respective data obtained with full calibration after replacing all negative signs of bias values by positive sings. The nonparametric repeated-measures Friedman test31 followed by Dunn’s multiple comparison test32 was used for statistical comparison. Finally, for identification of the best one-point calibrators, bias values at the three QC concentrations were summed up for each analyte in each of the seven datasets (full calibration and A through F) after replacing negative signs by positive signs. The same was done for the intermediate precision values. Thereafter, the resulting sums were compared by the Friedman test31 followed by Dunn’s multiple comparison test.32 The one-point calibrators associated with the smallest rank sum differences from the respective full calibration data were considered the most appropriate one-point calibrators. All statistical calculations were performed with GraphPad Prism 3.02 software, and P < 0.05 was considered statistically significant. RESULTS Figures 1-6 show box and whisker plots of bias (upper panels) and intermediate precision data (lower panels) obtained with full (29) Shah, V. P.; Midha, K. K.; Findlay, J. W.; Hill, H. M.; Hulse, J. D.; McGilveray, I. J.; McKay, G.; Miller, K. J.; Patnaik, R. N.; Powell, M. L.; Tonelli, A.; Viswanathan, C. T.; Yacobi, A. Pharm. Res. 2000, 17, 1551-1557. (30) U.S. Department of Health and Human Services, Food and Drug Administration. Bioanalytical Method Validation. http://www.fda.gov/cder/guidance/ index.htm (accessed May 17, 2007). (31) Friedman, M. J. Am. Stat. Assoc. 1937, 32, 675-701. (32) Dunn, O. J. Technometrics 1964, 6, 241-252.

and one-point calibrations A-F at low (left panels), medium (middle panels), and high QC concentrations (right panels) for assays I-VI, respectively. The dotted lines indicate the acceptance limits for these parameters. Asterisks indicate statistical significance in comparison to full calibration as obtained by Dunn’s multiple comparison test: one asterisk, P < 0.05; two asterisks, P < 0.01; three asterisks, P < 0.001. In assay I, the acceptance criteria were fulfilled for all analytes at all QC concentrations when calibrator D was used. With calibrators C and E, only a single bias value was slightly outside the acceptance limit, namely, that for thiopental at the high QC concentrations (15.6%) and that for phenobarbital at low concentrations (20.6%), respectively. The other one-point calibrators were associated with more bias values outside the limits. With respect to intermediate precision the acceptance criteria were always fulfilled no matter which one-point calibrator was used. In assay II, all bias values lay within the acceptance limits when calibrators B, C, or F were used, whereas with calibrators D and E only the bias values for S-MDA lay outside with 22.3% and 21.0%, respectively. The intermediate precision data fulfilled the acceptance criteria with all calibrators. In assay III, there was no one-point calibrator with which all bias values met the acceptance criteria. The best performance was obtained with calibrators D-F that led to acceptable bias values for all analytes at medium and high QC concentrations and for all analytes but amphetamine at low QC concentrations (22.1%, 22.7%, and 26.7%, for D, E, and F, respectively). In the intermediate precision evaluation, the acceptance criteria were only fulfilled for all analytes when calibrator B was used. With calibrator D, only Analytical Chemistry, Vol. 79, No. 13, July 1, 2007

4971

Figure 3. Box and whisker plots of bias (upper panels) and intermediate precision data (lower panels) for assay III as obtained with full and one-point calibrations A-F at low (left panels), medium (middle panels), and high (right panels) QC concentrations. The whiskers represent the lower and upper extreme value, the boxes the interquartile ranges, and the lines within the box the median values (n ) 17). The dotted lines indicate the acceptance limits for bias and precision at the respective concentration level. Asterisks indicate statistical significance in comparison to full calibration as obtained by Dunn’s multiple comparison test: one asterisk, P < 0.05; two asterisks, P < 0.01; three asterisks, P < 0.001.

the value for amphetamine at the medium QC concentration was barely outside the limit (15.2%). For all other calibrators, at least two intermediate precision values were above the acceptance limit. In assay IV, the acceptance criteria for bias were not fulfilled for all analytes with any of the one-point calibrators. The best results were achieved with calibrator D, which yielded bias values outside the acceptance limit for clozapine-N-oxide (25.2%) and fluphenazine (27.7%) at low QC concentrations and for zuclopenthixol at medium and high QC concentrations (15.6% and 15.1%, respectively). The next best was calibrator F with four values outside the acceptance limit at low QC concentrations, namely, for droperidol (23.1%), clozapine-N-oxide (41.2%), perazine (23.2%), and fluphenazine (20.4%), and one bias value outside the acceptance limit at medium concentrations, namely, for clozapineN-oxide (29.1%). Calibrators B and D performed best with respect to intermediate precision, but in both cases six intermediate precision values did not meet the acceptance criteria. For calibrator B, they were those for clozapine-N-oxide (32.8%) and fluphenazine (22.7%) at low QC concentrations, that for clozapine-N-oxide (29.2%) at the medium QC concentration, and those for clozapineN-oxide (24.9%), flupenthixol (16.7%), and olanzapine (22.4%) at high QC concentrations. For calibrator D, they were those for clozapine-N-oxide (39.3%), fluphenazine (25.4%), and perazine (20.4%) at low QC concentrations, that for clozapine-N-oxide (32.9%) at the medium QC concentration, and those for clozapineN-oxide (28.0%) and olanzapine (19.5%) at high QC concentrations. In assay V, the bias values fulfilled the acceptance criteria for all analytes at medium and high QC concentrations when calibrator D or E was used. At low QC concentrations, two bias values 4972 Analytical Chemistry, Vol. 79, No. 13, July 1, 2007

were outside the acceptance limit with calibrator D, namely, those for esmolol (23.6%) and nadolol (26.1%), and three with calibrator E, namely, those for betaxolol (20.6%), esmolol (24.6%), and nadolol (24.8%). No calibrator yielded intermediate precision data fulfilling the acceptance criteria at all QC concentration levels. At high QC concentrations, all values were within the acceptance limit with calibrator E and only one was barely outside with calibrator D, namely, that for penbutolol (16.0%). At medium QC concentrations, calibrators C, D, E, and F yielded intermediate precision data that fulfilled the acceptance criterion for all analytes except esmolol, for which the values were 19.8%, 16.6%, 20.3%, and 21.0%, respectively. At low QC concentrations, the intermediate precision data failed the acceptance criterion for six and seven analytes with calibrators C and D, respectively, and even more with the other calibrators. With calibrators C and D, the values were 33.1% and 32.2% for bupranolol, 22.8% and 23.3% for carvedilol, 28.2% and 24.8% for esmolol, 41.9% and 48.9% for nadolol, 28.2% and 34.0% for nebivolol, and 28.9% and 27.3% for penbutolol, respectively. In addition, the value for oxprenolol (21.2%) was outside the limit with calibrator D. In assay VI, calibrators D and E performed well at medium and high QC concentrations. For calibrator D, only the values for nordiazepam were outside the acceptance limit with 19.8% and 15.3% at medium and high QC concentrations, respectively. For calibrator E, the respective nordiazepam data were 27.3% and 23.2%, and the value for flumazenil (16.0%) at the medium QC concentration was also outside the limit. At the low QC concentrations, the bias values for a number of analytes failed the acceptance criteria with calibrators D and E. The values were 51.0% and 50.8%

Figure 4. Box and whisker plots of bias (upper panels) and intermediate precision data (lower panels) for assay IV as obtained with full and one-point calibrations A-F at low (left panels), medium (middle panels), and high (right panels) QC concentrations. The whiskers represent the lower and upper extreme value, the boxes the interquartile ranges, and the lines within the box the median values (n ) 18). The dotted lines indicate the acceptance limits for bias and precision at the respective concentration level. Asterisks indicate statistical significance in comparison to full calibration as obtained by Dunn’s multiple comparison test: one asterisk, P < 0.05; two asterisks, P < 0.01; three asterisks, P < 0.001.

for medazepam, 150.7% and 154.8% for oxazepam, 57.2% and 60.9% for prazepam, 73.1% and 74.2% for temazepam, 48.6% and 43.3% for tetrazepam, 34.4% and 38.5% for zaleplone, and 70.0% and 79.8% for zopiclone, respectively. In addition, the value for triazolam (23.2%) failed the acceptance limit with calibrator D and that for flumazenil (21.3%) with calibrator E. The intermediate precision data were within the required limits at medium and high concentrations when calibrators D, E, or F were used, but as for bias the situation was much less favorable at low QC concentration. Here, 8, 10, and 11 values failed the acceptance limit with calibrators D, E, and F, respectively. With calibrator D, the values were as follows: 23.6% for clonazepam, 35.6% for desalkylflurazepam, 78.8% for medazepam, 48.7% for nitrazepam, 24.0% for oxazepam, 54.8% for triazolam, 27.8% for zaleplone, and 50.5% for zopiclone. Statistical comparison of full calibration and one-point calibration with calibrators A-F by the Friedman test at three different QC concentration levels yielded significant results with exception of the bias and intermediate precision data of assay I at the low QC level. The P-values were generally below 0.01. The results of the post-test comparison of all one-point calibration results with those of the respective full calibration using Dunn’s multiple comparison test are given in Figures 1-6. The only calibrators for which the results were fairly consistent over all six assays were one-point calibrators A and D. For calibrator A, at least four of the six comparisons in each assay showed a significant result. With calibrator D, no statistically significant differences between full calibration and one-point calibration were found in assays I-III. In the other assays, significant findings for this calibrator were

only observed for the bias values at the low QC concentrations and for the intermediate precision data at the low QC levels in assays V and VI and the medium QC level in assay V. The bias sums and intermediate precision sums obtained with full calibration and one-point calibration were always found to be statistically significant by the Friedman test with P-values generally below 0.01. The results of posthoc analysis by Dunn’s multiple comparison test are listed in Table 2. With exception of calibrator A, none of the one-point calibrators differed significantly from full calibration with respect to bias sum and intermediate precision sum in assay I. The smallest difference to full calibration was observed for calibrator D. In assay II, calibrators B, C, and D were not significantly different from full calibration. The observed rank sum differences were essentially similar for all three. Calibrators B and D showed a somewhat lower value with respect to bias sum, whereas calibrator C had a lower value for intermediate precision sum. In assay III, calibrators D though F were not significantly different from full calibration and calibrator E led to the smallest rank sum differences for both bias sum and intermediate precision sum. In assay IV, calibrators B, D, and F were not significantly different from full calibration. Calibrator D was associated with the lowest rank sum difference for bias sum, whereas calibrator B was associated with the lowest rank sum difference for intermediate precision sum. In assay V, all one-point calibrators differed significantly from full calibration regarding intermediate precision sum, whereas no significant difference was observed between calibrators D through F and full calibration regarding bias sum. With calibrator D the rank sum differences were smallest for both parameters. In assay VI, significant differences between one-point Analytical Chemistry, Vol. 79, No. 13, July 1, 2007

4973

Figure 5. Box and whisker plots of bias (upper panels) and intermediate precision data (lower panels) for assay V as obtained with full and one-point calibrations A-F at low (left panels), medium (middle panels), and high (right panels) QC concentrations. The whiskers represent the lower and upper extreme value, the boxes the interquartile ranges, and the lines within the box the median values (n ) 22). The dotted lines indicate the acceptance limits for bias and precision at the respective concentration level. Asterisks indicate statistical significance in comparison to full calibration as obtained by Dunn’s multiple comparison test: one asterisk, P < 0.05; two asterisks, P < 0.01; three asterisks, P < 0.001.

Figure 6. Box and whisker plots of bias (upper panels) and intermediate precision data (lower panels) for assay VI as obtained with full and one-point calibrations A-F at low (left panels), medium (middle panels), and high (right panels) QC concentrations. The whiskers represent the lower and upper extreme value, the boxes the interquartile ranges, and the lines within the box the median values (n ) 26). The dotted lines indicate the acceptance limits for bias and precision at the respective concentration level. Asterisks indicate statistical significance in comparison to full calibration as obtained by Dunn’s multiple comparison test: one asterisk, P < 0.05; two asterisks, P < 0.01; three asterisks, P < 0.001.

calibration and full calibration were observed for all one-point calibrators. The smallest rank sum differences were observed for 4974 Analytical Chemistry, Vol. 79, No. 13, July 1, 2007

calibrators D and E with respect to bias sum and for calibrator D with respect to intermediate precision sum.

Table 2. Rank Sum Differences and P-Values Obtained from Comparison of Full Calibration with One-Point Calibrations A-F in Assays I-VI by Dunn’s Multiple Comparison Testa bias sum

intermediate precision sum

assay calibration rank sum rank sum no. comparison difference significance difference significance I

A vs full B vs full C vs full D vs full E vs full F vs full

-30.5 -19.5 -7.5 -3.0 -6.5 -13.5

P < 0.001 ns ns ns ns ns

-23.5 -17.0 -9.0 -0.5 -5.5 -18.0

P < 0.05 ns ns ns ns ns

II

A vs full B vs full C vs full D vs full E vs full F vs full

-32.00 -10.0 -13.0 -10.0 -21.0 -26.0

P < 0.001 ns ns ns P < 0.05 P < 0.01

-25.0 -14.0 -8.5 -13.0 -3.0 7.5

P < 0.01 ns ns ns ns ns

III

A vs full B vs full C vs full D vs full E vs full F vs full

-6.5 -30.0 -40.5 -17.5 -11.5 11.5

ns ns P < 0.05 ns ns ns

-60.0 -49.0 -23.0 -20.0 -14.0 -23.0

P < 0.001 P < 0.001 ns ns ns ns

IV

A vs full B vs full C vs full D vs full E vs full F vs full

-98.5 -30.0 -47.5 -25.5 -80.5 -29.5

P < 0.001 ns P < 0.01 ns P < 0.001 ns

-84.0 -15.0 -2.0 -31.0 -62.0 -23.0

P < 0.001 ns ns ns P < 0.001 ns

V

A vs full B vs full C vs full D vs full E vs full F vs full

-99.0 -79.0 -76.0 -20.0 -33.0 -36.0

P < 0.001 P < 0.001 P < 0.001 ns ns ns

-128.0 -103.5 -47.0 -57.0 -65.0 -61.5

P < 0.001 P < 0.001 P < 0.01 P < 0.001 P < 0.001 P < 0.001

VI

A vs full B vs full C vs full D vs full E vs full F vs full

-106.5 -86.5 -69.0 -46.5 -45.0 -66.5

P < 0.001 P < 0.001 P < 0.001 P < 0.05 P < 0.05 P < 0.001

-148.0 -100.5 -61.5 -47.5 -73.5 -87.0

P < 0.001 P < 0.001 P < 0.001 P < 0.05 P < 0.001 P < 0.001

a

Abbreviation: ns, not significant.

DISCUSSION Multiple-point calibration is associated with a comparatively high workload when the number of samples to be analyzed in one batch is low or when even single samples have to be analyzed. Single-sample analysis is usually requested for rarely occurring analytes. In such situations, historic (stored) calibration curves often cannot be used, because the settings of the apparatus are likely to have changed since the last calibration because of routine maintenance (e.g., shortening of the column or cleaning the ion source in GC/MS). One-point calibration would be a useful alternative to cumbersome multiple-point calibration, because it can effectively reduce the time and cost of analysis. This is of particular importance when an analytical result is needed as quickly as possible, as in emergency toxicology. The aim of the present study was to evaluate the feasibility of one-point calibration in multianalyte bioanalytical assays by retrospective analysis of validation data of six such assays. All were MS-based, and the experimental designs in the respective validation studies had all been very similar allowing the same approach

for the comparisons presented here. The methods were selected to represent various constellations commonly occurring in bioanalytical methods (Table 1): Half of the studied assay employed GC/MS, the other half LC/MS. The spectrum of analytes ranged from weakly acidic barbiturates (assay I) over neutral to weakly basic benzodiazepines (assays I and VI) to various groups of basic drugs (assays II-V). Sample workup covered liquid-liquid extraction (LLE) in assays I and VI as well as solid-phase extraction (SPE) in assays II-V. Finally, stable-isotope-labeled analogues were used as internal standards (IS) for the majority of analytes (assays I and II), some analytes (assays III and VI), or no analytes at all (assays IV and V). A total of 4074 individual bias and precision values were included in the study (97 analytes × 7 calibrations × 3 QC levels × 2 parameters). All bias values with negative numbers were first transformed to positive numbers of the same values. This was necessary, because the statistical tests used in the present work are based on ranks and would have assigned a lower rank to a bias value of, e.g., -15% than to a bias value of, e.g., 5%. However, the -15% represent a larger deviation from the nominal value than the 5% value and thus must be assigned the higher rank. After transformation to +15%, the formerly negative value is higher than 5% and automatically assigned the higher rank. The bias and precision data were compared to acceptance criteria for bioanalytical assays as established by Shah et al.29 to check if these quality criteria for modern bioanalytical assays could be fulfilled using one-point calibration. Further statistical analyses were performed to check whether differences between one-point calibration and full calibration were significant or possibly attributable to random variability and to identify the candidates for the best one-point calibrator. The nonparametric repeatedmeasures Friedman test31 was chosen to check for significant effects of the grouping variable “calibration”. A nonparametric test was necessary, because the distributions of the studied populations were clearly not Gaussian (Figures 1-6) and because nonparametric tests are less affected by outlying values. A repeatedmeasures test was necessary to account for variability caused by the grouping variable “analyte”. The posthoc Dunn’s multiple comparison test32 was performed to check which of the one-point calibrations was significantly different from full calibration. Furthermore, the rank sum differences calculated by this test were used to identify the best one-point calibrator candidates assuming that small rank sum differences (neglecting signs) indicate small differences to full calibration and hence an appropriate one-point calibrator. Generally speaking, the results of one-point calibration were rather good as long as calibrator levels C or higher were used for calibration, confirming earlier reports on the feasibility of onepoint calibration in bioanalysis.1,5-7 Calibrators A and B yielded less favorable results, which is most probably attributable to the fact that the influence of changes in the response of a one-point calibrator on the slope of the linear through zero calibration line increases with calibrator concentrations approaching zero. Not surprisingly, the bias and intermediate precision for the assays with stable-isotope-labeled IS (Figures 1-4) were better than for those in which trimipramine-d3 (Figures 5 and 6) had been used as IS for structurally different analytes. This phenomenon should also be at least partly responsible for the seemingly better Analytical Chemistry, Vol. 79, No. 13, July 1, 2007

4975

performance of the GC/MS-based assays as compared to LC/ MS-based assays, because more stable-isotope-labeled IS were used in the earlier (Table 1). Regarding the performance in all six studied assays, calibrator D was the most appropriate calibrator level. The results obtained by one-point calibration with this calibrator were generally not significantly different from those obtained by full calibration, the low QC levels of assays IV-VI and the intermediate precision at the medium QC level of assay V being the only exceptions. In addition, the data for bias sum and intermediate sums of this calibrator were not significantly different from the respective full calibration data in four out of six assays and the observed rank sum differences were the lowest or among the lowest in all assays. Finally, the acceptance criteria for bias and precision of bioanalytical assays as established by Shah et al.29 were fulfilled for the vast majority of analytes when using one-point calibrator D. Those not fulfilling the criteria were usually observed at the low QC level and/or were only moderately above the acceptance limits (Figures 3-6). Therefore, these one-point calibration data could still be considered acceptable for emergency toxicology applications, where the accurate and precise quantification is less relevant except in the context of diagnosis of brain death. However, quite large bias and precision values were observed for some of the β-blockers and benzodiazepines at the low QC concentrations so that one-point calibration could at best yield semiquantitative data for the respective analytes. Consequently, multiple-point calibration is essential when reliable quantitative data are needed for these analytes.

4976

Analytical Chemistry, Vol. 79, No. 13, July 1, 2007

At the end, it must be taken into account that the presented findings are only valid for the six assays included in the study and are not necessarily transferable to other assays. It is therefore strictly recommended to evaluate the feasibility of one-point calibration, e.g., using the presented approach, before it is applied in routine analysis. Furthermore, it must be considered that onepoint calibration should not be used for assays with nonlinear or curvilinear response functions or for concentrations below or above the range of full calibration. CONCLUSIONS The presented study shows that the feasibility of one-point calibration data can be assessed from existing method validation data and that the proposed statistical analyses are useful for finding an appropriate one-point calibrator. In addition, the data show that one-point calibration with a calibrator moderately higher than the middle of the full calibration was applicable for reliable quantification of the vast majority of the studied analytes, whereas some could only be determined semiquantitatively. ACKNOWLEDGMENT We thank Dr. Peter Wollenberg for many helpful discussions on the statistical methods used in this work and Markus R. Meyer and Andrea E. Schwaninger for their help with the manuscript. Received for review January 10, 2007. Accepted April 18, 2007. AC070054S