Statistics - ACS Publications - American Chemical Society

Jun 15, 1995 - Statistics. Carl C. Garber. Anal. Chem. , 1995, 67 (12), pp 443–448. DOI: 10.1021/ac00108a028. Publication Date: June 1995. ACS Legac...
0 downloads 0 Views 927KB Size
Statistics Carl C. G a h r Coming MetPath Clinical Laboratory, One Malcolm Avenue, Teteboro, New Jersey 07608

This review focuses on recent discussions and advances in the definition and determination of method performance parameters, of laboratory operating specifications,given certain stated quality goals, and of estimation techniques for reference values. Very significantly,we have seen a healthy discussion on the value of performing statisticaltests of significance in contrast to estimations of confidence limits for a given error. The readers are particularly encouraged to refer to several recently published reference books that provide excellent overviews of statistical applications (K1K4). The latter is particularly comprehensive in reviewing many basic concepts in statistical applications as they relate to many different disciplines in the clinical laboratory. METHOD PERFORMANCE PARAMETERS

Detection limit estimates have been a source of confusion due to the variety of practical conventions and theoretical a p proaches. These extend from the very simple definition for limit of detection COD) as the concentration of analyte where the signal-to-noise ratio is 3, while the limit of quantitation (LOQ) has been commonly defined where the signal-to-noiseratio is 5. The reciprocal of this yields a noise-to-signal ratio or coefficient of variation (CV) of 0.2 or 20%, which is also a commonly used definition for LOQ. A more precise definition of LOQ would be that concentration at which the signal transformation process (application of calibration coefficientsto the signal) yields a value that reflects the known concentration or activity of the analyte within some tolerance. Below this point, one might be able to differentiate the sample from zero but cannot estimate a concentration or activity within acceptable limits of error. An example of this definition is that concentration where the CV plus the deviation from the theoretical value is less than 20% (K5). From a more statistical basis, the primary approach has been to make a number of replicate measurements of a sample blank or a sample of “zero”concentration,from which a limit of detection is projected as being some multiple of the standard deviation of the signal above the average value of the signal. The multiple factor in this estimate is determined from a one-sided table of Student’s t, based on the a error that one wishes to define and the number of degrees of freedom of the estimate, where the a error is that probability of signal from a zero sample being above some defined limit, i.e., from that portion of the tail of the distribution of “blank values above the cutoff. Hence, if the multiplier is selected for a of 0.05, then 5%of the time when there is no analyte present the signal is above this limit. One source of confusion is the failure to specify the a error when limits of detection are given; a typically ranges from 0.05 to less than 0.001. With a smaller a error (smaller tail), an analytical procedure becomes apparently less “sensitive”with respect to its implied ability to detect analyte. This may be by design for those applications where it is desirable to minimize the false detection of an analyte. Thus, we see limits of detection are estimated as 2SDo (a = 0.025) or 3sDo (a = 0.0013) limits of the blank signal distribution (K6). Another approach to estimate LOD may be taken by considering the overlap of the distribution of signal for some low nonzero concentration onto the distribution of signal for the zero sample (K7).This will provide an estimate of that

mean value concentration (denoted as LD by them) whose lower limit of the distribution of signal (denoted as U: for critical limit) will overlap with the upper limit of the distribution of signals from the zero sample. Simplistically, one might make the assumption that the distribution of signals is similar for both. In this case, then the upper limit of the zero distribution is the mean signal of zero plus 2SDo and the lower limit of the “lowest” concentration is the mean signal of this “lowest”sample minus 2SD (or SDo if they are equivalent); therefore, the intervening interval is a total of 4SDo above the zero mean value. This approach has been extended by considering the overlap of the confidence limits of the standard curve and the zero distribution (K8). The impact of the variance function (Le., the behavior of the imprecision of an assay as a function of concentration) was discussed relative to its impact on the estimation of such performance parameters as the minimum detectable concentration, the reliable detection limit, and limit of quantitation for four-parameter logistic models for immunoassays (K9).The limit of quantitation may be reduced by using weighted regression techniques to determine the calibration parameters to minimize the variability in the transformation of signal to concentration values (KlO).Empirical definitions were proposed for drug analysis by GUMS (K6,K11), where the criteria for measurement of a compound require the observation of several ion fragments, and consequently, the recommendation was put forward that this analytical technique may not lend itself to direct application of these statisticallybased models. Two observations may be appropriate here, in light of the variety of definitions of terms and approaches. Adoption of a standard convention for the definition and estimation of terms for the limit of detection and limit of quantitation would facilitate meaningful communication within the clinical laboratory community. On the other hand, recognizing that “one size shoe may not fit all” in terms of the applicability of a single definition to different technologies, it is recommended that the scientific community and the manufacturing community at least specify the definition(s) being used. The determinationof linearity or assessment of linearity received attention from two groups in response to concerns regarding the guidelines previously published by the National Committee for Clinical Laboratory Standards (K12).In contrast to the statistical t test for lack of fit, which compared the residual due to LOF to the residual due to pure error, an assessment of linearity was proposed selecting the polynomial order of best fit using hypothesis testing (K13),as described in standard statistics references (K14) by comparing their root mean square of residuals. The highest order which generated coefficients that were statistically different from zero was considered to fit the data. The polynomial was further evaluated by comparing its sum of squares to that for the linear function by an F test. In addition, the predicted residual for each concentration or sample in the study was assessed, to identify whether the residual exceeded analytical or clinical requirements (in contrast to the statistical t test of ref K12). The authors point out that their approach is not subject to the imprecision of the assay. This claim must be tempered by consideration of the impact of the standard error of Analytical Chemistry, Vol. 67, No. 12,June 15, 1995

443R

the mean value for each specimen that is used for the various polynomial regressions and their estimated residuals. With greater imprecision, the uncertainty in the mean values may favor selection of higher-order polynomials to provide inflection points between randomly varying mean values. Concepts of “dimensional nonlinearity” and “relative nonlinearity” were presented (K15) to provide an objective comparison to analytical or clinical performance specifications throughout the analytical range of the assay. A more easily understood approach proposes the extrapolation from some linear portion of the assay range out to a nonlinear region in a way that enables the estimation of the confidence limits of the magnitude of deviation due to nonlinearity (K16).This technique is referred to as the LPOm method, referring to an estimation of error for the point that is the “last point off the line. This extrapolation can be extended to the high end, the low end, or both. Further, this model is more general in the sense that the specimens that are prepared for the study do not have to be equally spaced in terms of analyte concentration. The readers are encouraged to review the discussion cited in ref K17 to gain further appreciation for the nuances between the testing of higher-order polynomials as compared to the LPOm approach. Estimation of interference effects by use of the multiple regression technique, proposed by Kroll and co-workersin 1987, is finding a number of applications. The technique involves the estimation of analyte-independent effects, analyte-dependent effects, interferent effects, and analytehterferent interactive effects (K18,K19). Each coefficient is tested against the null hypothesis using the Student‘s t test. In the former, this technique was used to estimate the interference effects of tyrosine with an enzymic assay for phenylalanine. The interference effect was found to be a function primarily of the interferent, with negligible independent or interactive effects. In the case of the latter, the very complex interference effect was determined for bilirubin with both kinetic and end-point enzymic phenol-aminophenazone peroxidase (PAP) assays of creatinine. The kinetic assay was found to be more susceptible than the end-point method. This technique enables the elucidation of the mechanism for interfering processes and, further, enables true analyte concentrationsto be back-calculated. This concept was also used in assessing the hemolysis interference on the multitest Hitachi 717 analyzer (K20).Correction formulas for interference effects were determined by multiple regression and facilitate the calculation of corrected test results for real-time reporting. An extensive review of the types of interferences and sources of interfering substances was recently published (K21).A model describing the impact of interferences in terms of “total erroneous results” and “charges for erroneous results” has been developed by test system or instrument in the clinical chemistry laboratory (K22).This model is a function of both the utility of the instrument in the laboratory and its susceptibility to interferences. The presence or addition of interfering substances can be utilized in a positive way to design specific tests, as in the determination of the isoenzymes of alkaline phosphatase in the presence of multiple, specific inhibitors (K23).This simultaneous multicomponent analysis was facilitated by the multiple regression technique described above to estimate the inhibition coefficients for a five-isoenzymemodel and a four-isoenzyme model (the latter assuming there was no difference in the inhibition of liver and macromolecular isoenzymes). The best combination of inhibition 444R

Analytical Chemistry, Vol. 67, No. 12, June 15, 1995

factors was found to include levamisole, phenylalanine, and heat inactivation at 56 and 65 “C. The concept of partial least-squares analysis, a special multivariate analysis technique, has been used to identify and quantitate the composition of urinary calculus with Fourier transform infrared analysis (K24). This technique operates on a multicomponent system, as in the previous example; however, the independent variables are not k e d , but are allowed to be random variables, and hence require a “training”or “calibration”set of data to select the variables and estimate their regression coefficients. The partial least-squares technique was found to be the most robust among a variety of multivariate methods. Method Comparison studies are traditionally summarized by estimates of between-methodbias or by a number of regression techniques. In a review of five approaches that included ordinary least squares, weighted least squares, the Deming method, a weighted Deming method, and the Passing-Bablok rank procedure, it was shown that the Deming method was the only approach yielding an unbiased estimate of the slope (K25),while the two weighted regression techniques were the most efficient (lowest root-mean-square error). This conlirms previous studies regarding the potential for error by the ordinary least-squaresregression technique. Collectively, all areas of the scientific community should distance themselves from using this traditionally comfortable approach and move on to more appropriate and readily available statistical techniques. We again see reminders of the inappropriate use of Pearson’s correlation coefficient, r, as an indicator of the agreement of two methods (K26l. It has been shown previously that this statistic is predominantly determined by the range of data in the data base and to a lesser degree by the scatter in the data. A different approach to assessment of the agreement of two methods has been proposed (K27,K28) whereby between-method performance is assessed by determination of total error derived from the intrasubject relative bias and the test method coefficient of variation. These two components of error are combined as a root-mean-squareerror, based on a iirst-order Taylor series approximation. This model implies that the bias is to be treated as a variance, rather than an offset or shift in results, independent of the precision. The impact of this type of treatment results in an underestimation of the total error. Further, if only one CV is used to estimate total error, then by definition the total error is predicted for the central 68%of the data of any given population. To predict the total error for 95%of the population, 1.96CV should be used in the estimate. In an application of this concept to a routine, but unspecified laboratory assay for total cholesterol, the authors found that that the test results for 37%of the subjects exceeded their maximum total error estimate. This is not surprising, given the above comments regarding this approach. Furthermore, this finding is inconsistent with the experiencethat the overwhelming majority of cholesterol assays are able to perform within the regulatory guidelines for proficiency testing. Between-method performance was assessed by a combination of two-way ANOVA and linear regression in a multicenter evaluation (16 laboratories, 7 different types of kits) of plasminogen activator inhibitor-1, PAI-1 (K29).Six of seven kits were found to be equivalent both in terms of their precision and in their response to common standard materials. The seventh kit reflected a difference in reactivity between latent and active PAI-1 and hence, could not be calibrated into equivalent performance.

A special type of method comparison analysis, in which one is interested in measuring the extent to which two assays give identical results, was reviewed, with the conclusion that two common indexes, intraclass correlation and limits of agreement, served a useful but not perfect role in measuring this type of comparison (K30). Recently, we have seen an interesting discussion on the relative merits for tests of significance as compared to the use of confidence limits (K31-K37). Traditionally, conclusions have emphasized the use of tests of significance, such as the x2 test, F test, and t test. The concern was that such information did not communicate the magnitude of the parameter being evaluated by the test. Confidence intervals can supply that information and enable the test of significance to be viewed within the context of the total variability of the parameter($ being estimated, rather than as some ratio of parameter estimates. METHOD PERFORMANCE SPECIFICATIONS We have seen a significant increase in the discussion of quality goals for laboratory tests (K4, K38-K45). The majority of these goals are being developed from estimations of within- and between-subject biologic variation. From these determinations, predictions of the maximum allowable analytic CV and maximum bias are made using such approaches as described in ref K4. While information on biological variation may be a recommended approach, other sources for setting quality goals should also be considered, such as regulatory standards and surveys regarding clinical decision-making scenarios, with the objective that these different sources would lead to consistent quality standards (K38K40). A comparison of goals for precision (with and without method bias) are compared to “state-of-the-art”performance for 14 common analytes in chemism, 10 of which showed sufficiently good precision to meet the stated quality requirements, and the remaining 4 are candidates for improvement (K41). There have been a number of studies investigating the biological variability of particular analytes, not only to provide estimates for requirements for analytical precision (should be less than half the withii-subjectbiological variation) but also to predict what constitutes a significant difference in sequential test results (critical difference). One study reported on the biological variation of prothrombin time and activated partial thromboplastin time, which involved collection of data from 39 subjects over a Smonth period. Based on determinations of the within-subject biological variation, analytical CVs of less than 2% were recommended. Critical differences in results were estimated for these two coagulation tests (K42). Similarly, another study cited here determined the within- and between-subject variability of follitropin, lutropin, testosterone, and sex-hormonebinding globulin in 20 men over a 1-year period (K43), and a third studied the variation of prolyl endopeptidase and dipeptidylpeptidase in 13 women and 13 men also over 1year (K44). The intraindividual variation of glycohemoglobin, an important marker for assessment of control in diabetic individuals, was determined (K45) and used to predict requirements for precision and also for “critical differences”in sequential testing, over both 28 (short term) and 84 days (long term). For a very precise test, a difference of about 10%could be considered significant for stable patients retested over a 4-6week period, while over a longer time interval of 1113 weeks, a change could not be considered significant unless it exceeded about 20% for this same group of stable patients.

Through data collections and analyses as described here, the laboratory is able to provide very valuable interpretive guidance to the clinician in terms of the significance of changes in reported values. Using the same approach to derive precision requirements for nine tests used for assessment of coronary artery disease, it was concluded that the currently available technology achieved the required performance for analytical precision for six of the nine tests, but not for the two subfractions of highdensity lipoprotein (HDL, HDLJ) or for a directly measured LDL cholesterol fraction (K46). From a different perspective, the impact of the time between testing for total cholesterol and high density lipoprotein cholesterol (HDL) in women (K47) was determined by estimating a parameter called a “semivariogram”. This is defined as the average squared difference between replicate samples collected over some time interval, here up to 26 days. For cholesterol, the variability between sampling increased for intervals up to 12 days, while for HDL, Variability increased over the period from 1to 7 days. From these data, the authors were not only able to estimate the required assay imprecision (CV) but were also able to show the minimum time interval over which a physician might want to retest a patient. To test any more frequently than this does not allow for the potential individual variations to have their full effect. Finally, in our discussion on estimation of quality goals or specifications for precision from determinations of biological variations, we must keep in mind that in order for the laboratory to meet the required goal, it must monitor the ongoing variability of an assay under routine operation through an appropriate QC system in a way that detects any degradation from the stated performance, 100%of the time (ideally). It has long been pointed out that most QC systems are not able to detect these changes, at least to this high a degree. The model for incorporating the performance capability of a set of QC rules into the estimation of required laboratory performance from stated quality goals has been developed previously, but is considered so fundamentalthat it is reproduced here, where SE,a is the maximum shift in the assay that can be allowed, given the maximum total error specification,routine bias, and routine SD of the assay. The value 1.65 is used to allow 5%of the results (on the tail of the distribution of measurements) to exceed the total error specification. If only a 1%defect rate were desired, then this term would change from 1.65 to 2.58

SE,, = [(TE - bias)/SD] - 1.65 and

RE,,

= (“E - bias)/l.65SD

where E,itis the maximum increase in imprecision that can be tolerated and hence must be detected by the QC system. These equations are a fundamental quality planning tool that should be adopted to facilitate the incorporation of all the decision-making processes in laboratory management. This equation provides the link between the method evaluation process where precision and bias have been determined (and judged acceptable) and the ongoing quality control process (K48). Given the inputs to these equation, one can select those QC rules that will provide a certain level of confidence that the quality requirements stated by the total error specification are met on an ongoing basis. These equationsprovide the quantitative documentation that we all desire in managing a laboratory (more on total quality management in AnalyticalChemistry, Vol. 67, No. 12, June 15, 1995

445R

a later section). This approach was used to compare the European proposed quality goals for precision and accuracy (based on biological variation) to those quality goals required in the United States as defined in the CLU 88 federal regulations for proficiency testing (K49). In addition, this model enabled the comparison of quality goals determined by surveys of decision-making scenarios to proficiency testing goals (K50). The development of goals may not be based on the same steps in the overall laboratory testing and interpretation process, so it is important to be able to sort out the analytical requirements and then make direct comparison of these to observed laboratory performance. Clearly, the selection of a system of QC rules in a way that maximizes the error detection capability and minimizes false rejection signals is important. There have been a number of analyses of different types of QC rules with this objective in mind. An extremely clever approach was described to provide a visual assessment of the error detection capability that projected a bivariate Gaussian probability density function over the rejection region defined by a particular rule for two control observations (K51).The visual perception was augmented by estimation of the total error detection capability as the volume under the surface and over the rejection regions. The effect of between-run variability was clearly shown, in terms of degrading the error detection capability of a QC system, as compared to the case where there is no between-run variation. This approach displayed the notion that no one rule is able to detect all errors and that a multirule provides the best error detection when the number of observations is small. A comparison of the error detection capability of common Shewhart type QC rules to the CUSUM (cumulative summation) technique was presented with the conclusion that, in general, the CUSUM is more sensitive (K52),thereby removing the need to have to select among several candidate criteria. In response (K53), it was pointed out that the added complexity of a CUSUM technique may not be warranted in all cases, such as in those instances where a single simple rule provides a high error detection capability, but should be considered in cases where error detection probabilities of Shewhart type QC rules are low. A very novel approach for designing QC procedures or optimizing a previously selected QC rule was described, using a genetic algorithm concept (K54). (Please refer to this reference for the theoretical basis of this approach.) For a given set of operational conditions and performance requirements, critical errors were estimated for maximum degradation of the process (shift or increase in precision), which serve as boundaries for the genetic search of optimized QC rules. Rather than using QC rules with nominal limits like a 1-3s rule, this process may generate a fractional limit, such as a 1-3.234s rule, or rather than a multirule like 1-3S/2-2S/R-4S, this output might generate a combination of rules like 1-3.lS/2-2.33S/R-3.75S. REFERENCE VALUES

Additional focus in being placed on standardization of assays to enable universal application of the same reference values as our health care delivery systems expand in geographic scope to regional and national systems (K55). On the other hand, as a result of increased studies on within-subject and between-subject biological variation of tests (for example, refs K42-K44), there is the implication that the use of within-subject-based reference values is more appropriate than use of the traditional groupbased reference values. The key indicator for use of individualized 446R

Analytical Chemistry, Vol. 67, No. 12, June 15, 7995

reference values is the “index of individuality” for a given test, which is the ratio of the within-subject variability to the betweensubject variability. If this ratio is equal to 1, then obviously the group reference range applies equally well to monitoring an individual. If the ratio is less than 1,it might be inferred that the group reference range will be too wide for the monitoring of an individual, leading to a decrease in the diagnostic sensitivity. It can also be said that, in such a situation, the use of a group reference range may also lead to an increase in specifcity. Caution in overuse of individual-based reference values was recommended (K56) with reference to the original proposal for this concept (K57, K58). Groupbased reference values were considered to be appropriate for index of individuality ratios up to 1.4, recognizing that as the ratio increased over 1.0, the diagnostic sensitivity would increase and the diagnostic specificity would decrease for certain individuals. As was pointed out, clinicians may not make decisions based on statistically derived reference ranges (whether group or individual based), but rather make judgments at decision levels that may not coincide with reference range limits. The use of receiver-operator characteristic (ROC) curves is a means to define appropriate decision levels. An excellent review of this approach has been written recently (K59), showing with examples the relation to the likelihood ratio which is another approach for setting decision levels. The likelihood ratio is defined as the ratio between the probability that a particular test result is positive in the presence of disease divided by the probability that the test result is positive in the absence of disease. Simply stated, it is the ratio of the ratio of the true positive fraction to the false positive fraction or sensitivity/(l - specificity), its value being the slope of the ROC curve for any point on the curve. In a brief letter (K60),we are reminded of the importance of taking analytic variability into account when estimating likelihood ratios, the concern being first raised in 1979 without further development (K61).When multiple tests are performed, the estimate of the overall diagnostic sensitivity and specificity is dependent on the testing procedure (parallel or series testing) and on how independent the tests are to each other. A mathematical model was developed to investigate the effect of increasing between-test correlation on the combined test performance (K62). An interesting extension from the concept of the predictive value or the likelihood ratio for a test result is the concept of the “information value” of a test (K63). The intent of this parameter is to quantitate the added value of a test. For example, if the predictive value for a test result were 90%,and we knew by other means, such as a physical exam, that the probability for disease for the patient was 80%(before the test was performed), then the added value of the test is only 90/80 or about 10%more. By taking the log (base 2) of the ratio of the posttest probability to the pretest probability, the information value appropriately becomes zero when the posttest and pretest probabilities are equal. Returning back to the issue of reference values, there have been several interesting discussions regarding the estimation of reference ranges from populations of results not obtained from individuals previously characterized by their state of health. Ideally, the donors should be carefully selected, based on predetermined criteria regarding health status, medication, gender, age, and so on. Given the imperfect nature of a screening form to select healthy donors, an exclusion procedure was developed to further eliminate paranormal individuals, based on

identifying the central distribution of results believed to be free from abnormal results, which is then extrapolated to upper and lower limits for the reference interval (K64). In this particular case, the authors also add a correction to the limits to compensate for the analytic imprecision of the assay. The impact of exclusions of results was also investigated in a study to determine reference intervals for plasma lipids and lipoproteins for 203 carefully selected women and men (K65).Exclusion of “outlying” values resulted in 22%of the participants being excluded. However, the impact of this exclusion process had little to no effect on the reference intervals. In many cases, it is not possible or cost effective to select healthy donors, but rather to use the data base already available to the hospital laboratory. A brief editorial (K66) discusses the issues regarding estimation of reference ranges by indirect means from nonhealthy patients, with the recommendation that selection of hospital patients be made by use of the medical records and, further, to then delete those patient results that appear to be “outliers” but may not have been eliminated in the initial selection process due to incomplete or incorrect medical records. An example of this approach is reported (K67)whereby a comprehensive list of excluded diagnoses was developed. The remaining patients in the hospital data base were then used to estimate reference values for hemoglobin, mean corpuscular volume (MCV), and red blood cell count (RBC) by gender and adult age (by decade). However, lacking the clinical status of patients, several approaches has been described over the years to further r e h e mixed or unselected data. Recently, an approach that deconvolutes the histogram of patient/donor results into a series of overlapping normal distributions was described (K68). Using examples of total serum protein and lactate dehydrogenase, in both cases, the histograms were deconvoluted with two and three normal distributions, respectively. By adding abnormal results to an initial known distribution, the impact of contaminating results was studied by simulation. With a contamination rate of lo%,the accuracy of the redetermined reference range was found to be within 196, and the uncertainty to be less than 3%,for an initial data base of 2000 results. The technique provided consistent estimates of reference limits up to a contamination rate of 50%. A similar technique to deconvolute a series of normal distributions from contaminated background was described (K69), which used the concept of derivative domain least-squares analysis. The approach involved fitting the first derivative of the data to the first derivative of the sum of a series of normal distributions. Applications were given in the area of flow cytometry. This may be considered an extension of the Bhattacharya procedure that was m o d ~ e to d estimate reference intervals from hospital patients (K70). Two proposals have been described that allow for comparison of different populations of results (for example a group of employees) to reference value limits (K71,K72). The former presents a nonparametric technique that compares upper-tail distributions of a study group to the reference population. They show that as long as the reference group is large, so that the uncertainty in its upper limit is small, one can statistically detect fairly small differences in a study group (even when n is 10 to 25). This is a particularly important observation, in light of the United States federal regulatory requirement (CLIA 88) that most laboratories must demonstrate that a specified reference range is applicable to their laboratory. Similar conclusions were made

using a Monte Carlo sampling technique to compare reference ranges (K72). In nonclinical settings, criteria other than the central 95%are used to set upper reference limits. For example, in sports medicine, the use of human chorionic gonadotropin (HCG) by male athletes to enhance their performance has been banned by the International Olympic Committee. A proposed cutoff for detection of the presence of HCG in urine was estimated from the results for 1400 men by calculating the 75th percentile (Q3) plus three times the interquartile (Q3 - Q1)range to give an “extremelyunusual” limit of 5 IU/L (K73).This was then doubled to a limit of 10 IU/L to ensure there were no false positive results. TOTAL QUALITY MANAOEMENT IN THE CLINICAL LABORATORY In summary, Fraser (K74 has described a systematic approach to the development of the information necessary for the introduction and utilization of any new test in the clinical laboratory. He emphasized the idea that the knowledge of the biovariability of an analyte is fundamental to its utility in clinical practice and that such assessments should be planned into new method or new product evaluations, much the same as for new drug trials. His proposal is divided into five phases, in a way that generates biological variation and clinical data along with the determination of analytical performance, so that diagnostic sensitivity, specificity, predictive values, and outcome measures including cost/benefit analysis are determined prior to general application of the new test. This systematic approach lends itself to the total management of the planning of clinical research, development, and implementation of new tests. Only those new tests that have demonstrated utility in the detection, diagnosis, and monitoring of disease would be recommended for clinical use. A comprehensive review of the cost benefits of laboratory testing was the topic of the 1993 Clinical Chemistry Forum. Analysis of laboratory testing in the following arenas was reviewed: mulitphasic testing (K75),case-finding in the ambulatory care setting (K76),and centralized testing vs distributed testing in the hospital (K77). Modeling of costs for screening for hereditary hemochromatosis compared to nonscreening was developed in detail (K78).Monitoring the diabetic within the guidelines of the American Diabetic Association (K79) and laboratory testing for alcoholic liver disease (K80)were reviewed as specific examples. Societal issues relative to the utilization of new technologies were reviewed (K81-K83). Given the total quality planning model presented above (K74),many of these issues should be addressed prior to test introduction. A graphic approach (K84, using the “freckle” plot has been reported for the analysis of testing turnaround time by hour for emergency testing services. The benefit of this approach is that it identifies when the greatest demands occur for this type of service and pinpoints the variability of service throughout the time of day, revealing those incidents which exceed a given stated service goal. The causes for those events in excess of the service goal must be identified and investigated. A pareto diagram is useful in identifying those causes of greatest frequency for highest priority for corrective action. Other examples of application of total quality management and continuous improvement include such programs as the College of American Pathologists’sponsored Q-Probes program, which involve survey-driven investigations performed by many participating laboratories, from which “best practices” are identified, for example (K85). Analytical Chemistry, Vol. 67,No. 12, June 15, 1995

447R

In summary,the primary applications of statistics in the clinical laboratory are related to the evaluation and monitoring of test performance, with a number of selected applications reviewed here. Both of these processes must be related to the requirements of the end-user, the clinician, and to a lesser extent, the regulatory agency. A model has been developed that facilitates the linkage of the performance information to the requirements or quality goals. Beyond this, a proposal for a comprehensive test develop ment, evaluation, and clinical and economic validation process has been described that provides a total quality management framework for optimal efficiency in the planning and delivery of quality

testing services. Carl C. Garber is the Director of Qualit Assurance at the Corning MetPath Clinical Laboratory in Teterboro,N I He received his B.S. degree in chemistry from the Universi of Alberta in Edmonton, Canada, and his M.S. and Ph.D. (1976) $gees in anal ical chemistry om the University of Wisconsin-Madison. Currently e is the chairho der of the Area Committee on Evaluation Protocols with NCCLS, the National Committee for Clinical Laboratory Standards. His interests include applications of statistics in method evaluation, quality control, and estimations of reference ranges. He has Koauthored several chapters in clinical chemistry reference books and contznues to provide workshops and seminars on this theme.

xt

P

LITERATURE CITED Strike, P. W. Statistical Methods in Laboratory Medicine; Butterworth Heimemann: Boston, 1991. Bailar J. C., 111, Mosteller, F., Eds. Medical Uses of Statistics; NEJhd Books: Boston, 1992. Meier, P. W.; Zund, R E. Statistical Methods in Analytical Chemistry; John Wile & Sons: New York, 1993. Haeckel. R.. Ed. Evakation Methods in Laboratow Medicine; VCH Publishers: New York, 1993. Flesch, G.; Mann, C.; Boss, E.; Lan M.; De en, P. H.; f)cl:terle, W. J. Chromatogr. B: Bzomed. Iipl. 1994657,155IVI.

Armbruster, D. A; Tillman, M. D.; Hubbs, L. M. Clin. Chem. 1994,40, 1233-1238. Hanseler, E.; Keller, B.; Keller, H. Clin. Chem. 1994, 40, 2046-2052. Gautsch, K; Keller, B.; Keller, H.; Pei, P.; Vonderschmitt, D. J. Eur. J. Clin. Chem. Clin. Biochem. 1993,31, 433-440. O’Connell, M. A.; Belanger, B. A; Haaland, P. D. Chemom. Intell. Lab. Syst. 1 9 9 3 , 20, 97-114. Szabo, G. IC;Browne, H. K; A’ami, A.; Josephs, E. G. J. Clin. Pharmacol. 1 9 9 4 , 3 4 , 242-2d9 Lawson, G. M. Clin. Chem. 1 9 9 4 , 4 0 , 1218-1219. Passey, R B. Evaluation of the linean’ty of quantitative analytical methods: EP6-e National Committee for Clinical Laboratory Standards: Villanova, PA, 1986. K&l, M. H.; Emancipator, K Clin. Chem. 1 9 9 3 , 39, 405415.

Draper, N. R.; Smith, H. A lied Re ression Analysis, 2nd ed.; John Wiley & Sons: Newvork, 19&; p 20-40. Emancipator, K; Kroll, M. H. Clin. d e m . 1 9 9 3 , 39, 766I (2.

Krouwer, J. S.; Schlain, B. Clin. Chem. 1993,39,1689-1693. Emancipator, K. Clin. Chem. 1 9 9 4 , 40, 1783-1785 Oetter); Schlain. B.: Krouwer. J. Clin. Chem. 1994, 40, 1785-1786 (response) .’ Lee, C. Lab. Med. 1 9 9 3 , 24 170-172. Eng, C. D.; Delgado, R; Kroh, M. H. Eur. J. Clin. Chem. Clin. Biochem. 1993,31 839-8350. Jay, D. W.; Provasek, D. Clin. Chem. 1993,39,1804 -1810. Kroll, M. H.; Elin, R. J. Clin. Chem. 1 9 9 4 , 40, 1996-2005. Ryder, K W.; Glick, M. R. Clin. Chem. 1993,39, 175-176. Tillver. C. R.: Rakhorst. S.: Collev, C. M. Clzn. Chem. 1994, 4G:803-810.’

Volmer, M.; Bolck, A.; Wothers, B. G.; de Ruiter, A. J.; Doombos, D. A, van der Slik, W. Clin. Chem. 1993,39,948954. Linnet, K. Clin. Chem. 1 9 9 3 , 39, 424-432. Kivisto, K T. Clin. Chem. 1993,39, 167-168. Miller, W. G.; McKenne ,J. M.; Conner, M. R.; Chinchilli, V. M. Cltn. Chem. 1 9 9 3 , r9, 297-304 qkjnchilli, V. M.; Miller, W. G. Clin: Chem. 1 9 9 4 , 40, 464411.

Declerk, P. J.; Moreau, H.; Jespersen, J.; Gram, L.; Kluft, C. Thromb. Haemostaszs 1993, 70, 858-863.

448R

Analytical Chemistry, Vol. 67,No. 12, June 15, 1995

Lee, L. Comput. Biol. Med. 1992,22, 369-371. Scialli, A R. Reprod. Toxicol. 1992, 6 , 383-384. Bimbaum, D.; SheDs, _ _ _ S.B. Infect. Control Hosp. 1Cpidemiol. 1992. 13. 553--555. Johnson. L. A. Anal. Biochem. 1992,206, 195-201. Harris, E. K. Clin. Chem. 1993,39, 927-928. Henderson, A. R. Clin. Chem. 1993, 39, 929-935. %[.in, S.; White, M. D. Am. J Infect. Control. 1993,21,210213.

Altman, D. G. Clin. Chem. 1994,40, 161-162. Pefersen, P. H.; Fraser, C. G.; Westgard, J. 0.; Larsen, M. L Clzn. Chem. 1992,38, 2256-2260. Fraser, C. G.; Petersen, P. H Clin. Chem. 1993,39, 14471453. Petersen. P. H.: Fraser, C. G. Clin. Chem. 1 9 9 4 , 40, 18651868. Stockl, D. Clin. Chem. 1993,39, 913-914. Dot, D.; Miro, J.; Fuentes-Arderiu, X. Ann. Clin. Biochem. 1 9 9 2 , 2 9 422-425. Valero-Pofiti, J.; Fuentes-Arderiu, X. Clin. Chem. 1993,39, 1192-1 7 Q F I,&*- I , Y * .

Maes, M.; Scharpe, S.; De Meester, I.; Goossens, P.; Wauters, A.; Neels, H.; Verkerk, R.; De Me er, F. D’Hondt, P.; Peeters, D.; Schotte, C.; Cosyns, P. Clin. &em. 1994,40,1686-1691. Phillipou, G.; Phillips, P. J. Clin. Chem. 1993,39,2305-2308. Pagani, F.; Pante hini, M. Clin. Biochem. 1993,26,415-420. Choudhury, N.; %all, P. M. L.; Truswell, A. S. Clin. Chem. 1994,40, 710-715. zs’,e,e,D. A.; Westgard, J. 0. Clin. Chem. 1 9 9 3 , 39, 1504IJIL.

West ard J 0 . Seehafer, J. J.; Barry, P. L. Clin. Chem. 1994, 40, 1828212321 West ard, J. 0.;Seehafer, J. J.; Barry, P. L. Clin. Chem. 1994, 40, 1809-1914 Parvin, C. A Clin. Chem. 1 9 9 3 , 3 9 , 440-447. Bishop, J.; Nix, B. J. Clin. Chem. 1 9 9 3 , 39, 1638-1649. Westgard, J. 0. Clin. Chem. 1994,40, 499-500. Hatjimihail A T. Clin. Chem. 1 9 9 3 , 39, 1972-1978. Groth, T.; de Verdier, C. H. Clin. Chim. Acta 1993,222,129* “ A

lJ3.

Vasikarin, S.D. Ann. Clin. Biochem. 1993,30, 594-595. Harris, E. K Clin. Chem. 1 9 7 4 , 2 0 , 1535-1542. Fraser, C. G.; Hams, E. K. Crit. Rev. Clin. Lab. Sci. 1989, 27, 409-437. Zweig, M. H.; Campbell, G. Clin. Chem. 1 9 9 3 , 39, 561-577. Malvano, R; Chiecchio, A.; Ferdeghini, M. Clin. Chem. 1993, 39, 697-698. Van der Helm, H. J.; Hische, E. A. H. Clin. Chem. 1 9 7 9 , 2 5 , 985-988. Chiecchio, A; Malvano, R.; Biblioli, F.; Bo, A. Eur. J. Clin. Chem. Clin. Biochem. 1994,32, 169-175. Johnson, H. A. Ann. Clin. Lab. Sci. 1 9 9 3 , 23, 159-164. Richardson,-Jones, A.; Twedt, D.; Gibson, M.; Fimlt, N. Z.; Hellman, R. Ann. Clin. Lab. Sci. 1993, 23, 340-349. Nilsson-Ehle, J.; Lanke, J.; Nilsson-Ehle, P.; Tryding, N.; Schersten, B. Scand. J. Clin. Lab. Invest. 1 9 9 4 , 54, 137-146. Solber H. E. Clin. Chem. 1 9 9 4 , 40, 2205-2206. Kairisto, V.; Virtanen, A.; Uusi aikka, E.; Rajamski, Kouri A.; Fihneman, H.; Juva, IC;Koivula, T.; ifanto, V. Clin. Chem. 1 9 9 4 , 40, 2209-2215. Matsuto T.; Su ita 0,; Kimura, S.; Okada, M. Rinsho Byori 1994,d2, 89-83 aapanese) Moore, D. H. Cytometry 1993, 14, 510-518. Baadenhuijsen, H.; Smit, J. C. J. Clin. Chem. Clin. Biochem. 1985,23,829-839. Van Der Meulen, E.; Boo aard, P J.; Van Sittert, N. J. Clin. Chem. 1994,40, 1698-1fO2. ’ Holmes, E. W.; Kahn, S.E.; Molnar, P. A.; Bermes, E. W., Jr. Clzn. Chem. 1 9 9 4 , 40, 2216-2222. Laidler, P.; Cowan, D. A,; Hider, R. C.; Kicman, A. T. Clin. Chem. 1 9 9 4 , 40, 1306-1311. Fraser, C. G. Clin. Chem. 1994,40, 1671-1673. Altshuler, C. H. Clin. Chem. 1994,40, 1616-1620. ScIw$ein, M.; Boland, B. J. Clin. Chem. 1994, 40, 1621-

9.;

IUL I .

(K77) Winkelman, J. W.; W ben a D. R.; Tanasijevic, M. J. Clin. Chem. 1994,40, 162&-1630: 78 Buffone, G. J.; Beck, R Clin. Chem. 1994,40,1631-1636. &9] Goldstein, D. E., Litti R. R.; Wiedme er H.-M.; En land J D.; Rohlting, C. L.; Wlke, A. L. Clzn. d e m . 1994,4 8 163?1GAn

(K80) gi>&an, A. S., Lieber, C. S. Clin. Chem. 1 9 9 4 , 40, 16411651.

Holtzman, N. A. Clin. Chem. 1 9 9 4 , 40, 1651-1657. Muller, B. Clin. Chem. 1 9 9 4 , 40, 1658-1662. Shimauchi A. Clin. Chem. 1994,40, 1663-1667. Pellar, T. G.; Ward, P. J.; Tuckerman, J. F.; Henderson, A. R. Clin. Chem. 1 9 9 3 , 3 9 , 1054-1059. Bachner, P.; Howanitz, P. J.; Lent, R. W. Am. J. Clin. Pathol. 1994,102, 567-571.