Statistical estimation of analytical data distributions and censored

Expected transformed values of censored points are computed from a straight line fitted to the transformed, accepted data, and these are then back-tra...
0 downloads 0 Views 800KB Size
Anal. Chem. 1989, 61, 2719-2724

2719

Statistical Estimation of Analytical Data Distributions and Censored Measurements Kirk K. Nielson* and Vern C. Rogers

Rogers and Associates Engineering Corporation, P.O. Box 330, Salt L a k e City, U t a h 84110-0330

A numerical method was developed for estimating the shapes of unknown distributions of analytical data and for estimating the expected values of censored data points. The method Is based conceptually on the normal probability plot. Data are ordered and then transformed by using a power function to achieve approximate linearity with respect to a computed normal cumulative probability scale. The exponent used in the power transformation is an index of the distribution shape, which covers a continuum on which normality is defined as d = 1 and log normality is defined as d = 0. Expected transformed values of censored points are computed from a straight line fitted to the transformed, accepted data, and these are then back-transformed to the original distribution. The method gives improved characterization of analytical data distributions, particularly in the distribution extremities. I t also avoids the biases from improper handling of censored data arising from measurements near the analytical detection limit. Illustrative applications were computed for atmospheric SO, data and for mineral concentrations in hamburgers.

INTRODUCTION In environmental surveillance of chemical compounds, trace elements, and radionuclides, analytical measurements often span very wide ranges, but still may represent a single population. T o represent the population by its measured parameters, a central value (mean, median, etc.) and distribution width (standard deviation, range, etc.) generally are sought for simplicity and convenience in statistical analyses. Two common causes of subjective bias in representing the population are the choice of distribution attributed to it and the method of dealing with censored data points. The distribution often is simply assumed to be normal or log-normal, even though it may have an intermediate or alternative shape. Even when distributions are analyzed by using normal or log-normal probability plots, their shapes still are usually assessed subjectively for linearity and for approximation to normality or log normality. While the biases resulting from inaccurate distribution assumptions affect both central value and width estimates, they can be particularly misleading when one is making decisions about confidence intervals or compliance with prescribed limits. The problem of censored data points also is common in environmental surveillance, where vanishingly small concentrations frequently lead to observations that are less than the analytical limit of detection (LD). As concentrations approach zero, the combined measurement uncertainties in samples and blanks can even cause a fraction of observations to be negative. Although there are numerical definitions of the appropriate LD for data acceptability (1-4), there is less consistency in reporting and interpretation of the lower, censored measurements. They variously are ignored or reported as zero, as LDb

meanc f SD

A1 Si

110 50 40 30 100 60 30 2 2 0.8 0.8 0.6 0.6 1

92 36 112 112 112 112 112 109 112 46 112 112 112 112

163 f 37 79 f 39 1920 f 350 2100 f 280 11800 f 1800 3190 f 780 1490 f 450 4.3 f 1.2 45 f 8 1.3 f 0.5 34 f 9 30 f 11 3.7 f 2.3 5.4 f 2.3

P S

c1 K

Ca Mn Fe cu Zn

Br Rb Sr

power-transformed fit param re1 fite uncert, distribn indexd % medianc 0.6 f -0.1 f 0.6 f 1.8 f -2.3 f -1.6 f 1.2 f 0.1 f 0.2 f 4.1 f 0.2 f 1.1 f -0.5 f -0.9 f

0.7 0.5 0.5 0.6 0.7 0.5 0.3 0.4 0.5 0.8 0.3 0.2 0.2 0.3

1.7 8.9 1.9 2.9 1.9 1.9 12.3 3.0 1.3 5.3 2.5 18.9 7.4 14.1

147 37 1900 2120 11400 3030 1500 4.1 45 0.7 32 30 3.1 4.8

slopec 44 21 350 270 1400 540 430 1.1

a

0.4 9 11 1.3 1.6

log-normal fit param medianC slopec 146 35 1880 2080 11700 3120 1420 4.1 44 0.7 32 28 3.3 5.0

41 23 350 290 1600 620 450 1.1 8 0.4 9 12 1.4 1.8

2a analytical limit of detection based on X-ray peak counting statistics. Number of measured values above the LD. Concentrations and standard deviations in ppm dry weight. dComputed from eq 4-6 and 8. eComputed as ( x [ ( Y- Y f i t ) / V 2 / ( -n 1))0.5.

log-normal distribution and to a normal distribution that was transformed by using the distribution index d = 0.301 in eq 4. The distribution index was determined iteratively by using eq 4-6, with the same point weighting indicated in Figure 7 . The computed distribution index indicates the data actually have a distribution shape that is intermediate between lognormal (d = 0) and normal (d = 1). Although the offset used in the three-parameter log-normal analysis (16) facilitates a quasi-log-normal fit to the data with similar medians (0.032 for 3P log versus 0.031 for power-transformed normal), the resulting fit, shown in Figure 7 , is considerably different in the extremities, of the distribution. At the low end, the negative values suggested by the three-parameter log fit are reduced by the power-transformed fit, and a t the high end, the maximum values are predicted to be lower by the power-transformed fit. If data as in Figure 7 were used to estimate compliance with an SO2 standard of 0.4 ppm, the three-parameter lognormal distribution would indicate noncompliance about 3 times as often as the power-transformed normal distribution. In an application to the analysis of mineral concentrations in fast foods, the method was used to estimate distribution shapes and censored data ranges. X-ray fluorescence analyses of 14 minerals in 112 commercially obtained hamburger samples yielded the means and standard deviations presented in Table I1 for all measurements above the LD. The distribution indices given in Table I1 were computed for each element by subjecting these data to the distribution analyses defined by eq 4-6. Censored values of Al, Si, Mn, and Cu were computed from fits to the power-transformed data, and the resulting fitted values were transformed back to the original data distributions. The computed minimum values in these distributions were 49,10,1.9, and 0.2 ppm for Al, Si, Mn, and Cu, respectively. The wide variation in distribution indices for different elements in Table I1 results from the different modes of occurrence of the minerals and the different populations to which they belong. Bromine and calcium are approximately normally distributed (d = l ) , whereas Si, Mn, Fe, Cu, and Zn are nearly log-normally distributed (d = 0). Distributions of C1, K , Rb, and Sr are skewed positively even more than a log-normal distribution, and S is skewed negatively from a normal distribution. The distributions of A1 and P are intermediate between normal and log-normal distributions. For comparison with the arithmetic means and standard deviations, the fitted intercepts (medians) and slopes (standard deviations) of the data also are presented in Table 11. For

the 10 elements with ranges above the LD, the fitted median values are similar to the means, and the fitted slopes are similar or slightly less then the standard deviations. For the four elements with ranges partly below the LD, the fitted medians are lower than the means due to inclusion of additional data points on the low end of the distribution. Their fitted slopes are generally slightly lower due to the improved distribution fit, despite a contribution toward higher values from the additional points at the low end of the distributions. The fitting uncertainties indicate very high values for several elements (Si, Ca, Br, Sr). Visual examination of their cumulative probability plots indicates significant nonlinear structure that results from more than one population. Thus the high fitting uncertainties are explained, and the singlepopulation representation for these elements is not appropriate. For comparison with the power-transformed medians and slopes (standard deviations), corresponding log-normal distribution fit parameters also are presented in Table 11. For the 10 completely measured distributions, the median of the log-normal fit (geometric mean) is lower than the powertransformed median when the distribution index is positive (d 2 0.2) and higher when the distribution index is negative ( d 5-0.2). Slopes are similar to or exceed the power-transformed slopes, indicating unexplained curvature (skewness) in the log-normal fits. These comparisons illustrate the improvement obtained when single-population distribution shapes are defined numerically instead of by an assumed shape. Due to the relatively narrow widths of all these distributions, however, the log-normal parameters also give reasonable values in many cases. This is consistent with the uncertainty relationship in eq 8, that distinctions between distribution shapes become increasingly important for distributions that are wide and that are defined by large numbers of data points. LITERATURE CITED (1) Currie, L. A. Anal. Chem. 1968, 4 0 , 586-593. (2) ACS Committee on Environmental Improvement. Anal. Chem. 1980, 52, 2242-2249. (3) Long, G. L.; Winefordner. J. D. Anal. Chem. 1983, 55, 712A-724A. (4) ACS Committee on Environmental Improvement. Anal. Chem. 1983, 55, 2210-2218. (5) Cohen. A. C. Technometrics 1959, 7 , 217-237. (6) Sarhan, A. E.; Greenberg, B. G. Contributions to Order Statistics; Wlley & Sons: New York. 1962. (7) Harter, H. L.; Moore, A. H. Biometrika 1966, 53, 205-213. (8) Corley, J. P.; Denham, D. H.; Michels, D. E.; Olsen, A. R.; Waite, D. A. A Guide for Environmental Radiological Survell/ance at ERDA Installa tions; Report ERDA-77-24, Battelle Pacific Northwest Laboratory, 1977.

2724

Anal. Chem. 1989, 61, 2724-2730

(9) Miller, M. L.; Fix, J. J.; Bramson, P. E. Radiochemical Analyses of Soil and Vegetation Samples Taken from the Hanford Environs. 7971- 7976, Report BNWL-2249, Battelle Pacific Northwest Laboratory, 1977. (IO) Gilbert, R. 0.; Kinnison, R. R. Health Phys. 1981, 4 0 , 377-390. (11) Toy, A. J.; Linkken, C. L. The Implications of Sampling from a LogNormal Population : Report UCRL-76936, University of California, Lawrence Livermore Laboratory, 1975. (12) Gumbel, E. J. SteHst/ce/ Thewy of Extreme Values and Some Practical Appkations: Applied Mathematics Series 33; National Bureau of Standards: Washlngton, DC, 1954. (13) Abramowitz, M.; Stegun, 1. A. Handbook of Mathematical FuncHons; Applied Mathematics Series 55; National Bureau of Standards: Washington, DC, 1970; p 933.

Dixon, W. J.: Massey, F. J. Introduction to Statistical Analysis; McGraw-HIiI: New York, 1969. Hastings, N. A. J.: Peacock, J. B. Statistical D/stributions; John Wiley & Sons: New York, 1975; p 100. On, W. R.; Mage, D. T. Compur. Ops. Res. 1978, 3 , 209-216. Mage, D. T.; On, W. R. J . Air Pollut. Control Assoc. 1978, 28, 797-798.

RECEIVED for review July 11,1989. Accepted October 3,1989. This work was partially funded by NIH Grants lR43-CA38519 and 9R44-DK38751.

Selective Detection of Carbon- 13, Nitrogen- 15, and Deuterium Labeled Metabolites by Capillary Gas Chromatography-Chemical Reaction I nterface/Mass Spectrometry Donald H. Chace and Fred P. Abramson*

Department of Pharmacology, The George Washington University Medical Center, Washington, D.C. 20037

We have applled a new chemlcal reaction interface/mass spectrometer technique (CRIMS) to the selectlve detectlon of "C-, "N-, and 2H-labeled phenytoin and its metabolites in urlne following separatlon by capillary gas chromatography. The mkrowave-powered chemlcal reaction Interface converts materlats from thelr origlnal forms Into small molecules whose mass spectra serve to ldentlfy and quantify the nuclides that make up each analyte. The presence of each element is followed by monitoring the Isotopic variants of C02, NO, or H, that are produced by the chemical reaction interface. Chromatograms showlng only enriched 13Cand 15Nwere produced by subtractlng the abundance of naturally occurring isotopes from the observed M 1 slgnal. A selectlve chromatogram of 2H (D) was obtalned by measuring HD at m / r 3.0219 with a resdutlon of 2000. Metabdttes representing less than 1.5% of the total labeled compounds could be Identified in the chromatogram. Detection ilmits from urine of 380 pg/mL of a ''N-labeled metabolite, 7 ng/mL of a 13Clabeledmetabolite, and 16 ng/mL of a deuterium labeled metabolite were determlned at a slgnal to nolse ratio of 2. Depending on the isotope examlned, a linear dynamic range of 250-1000 was observed uslng CRIMS. To ldentlfy many of these labeled peaks (metabolites), the chromatographic analysis was repeated with the chemical reactlon Interface turned off and mass spectra obtained at the retentlon times found in the CRIMS experiment. CRIMS Is a new analytical method that appears to be particularly useful for metabolism studies.

+

INTRODUCTION The stable isotopes 13C,15N,and *H (D) are utilized often in biochemical and pharmacological applications (I),particularly in the study of xenobiotic biotransformation ( 2 ) . Incorporation of a stable isotope is intended to make a drug and its labeled metabolites unique. Numerous mass spectrometric methods have been utilized in their detection. The most common of these methods has been the twin ion or ion cluster 0003-2700/89/0361-2724$0 1.50/0

GCf MS technique for the detection and structural identification of drug metabolites in complex biological samples (2, 3). In such a method, the mass spectrum of each chromatographic peak is examined for a characteristic ion pair or cluster, such as M+ and [M 31'. However, these techniques are tedious, structure dependent, and therefore lack generality. Due to the large number of chromatographic peaks in biological matrices, labeled compounds in low amounts may not be detected due to interference of overlapping mass spectra from other compounds. In contrast, radiolabels, i.e. 14C or 3H, have been used in metabolism studies because their methods of detection are not structure dependent. The presence of a radiolabel in a chromatogram, whether gas, liquid, or thin layer, directs the analyst to further investigation of the fractions containing the label, thereby greatly simplyfhg the investigation. We wished to develop a method that detected stable isotopes in a structure-independent manner similar to the way radiolabels are used. The goals for this method were to preserve the high chromatographic resolution characteristic of capillary columns while being sensitive, highly selective, versatile, and reliable. To this end, we used a chemical reaction interfacefmass spectrometer technique (CRIMS) to detect the most commonly used isotopes, 13C,15N, and D. Markey and Abramson ( 4 , 5 ) developed the chemical reaction interface, a microwave-powered device that completely decomposed a complex molecule to its elements in the presence of helium. The addition of a reactant gas, for example, oxygen, formed stable oxidation products (C02,SOz, H20,etc.) which reflected the elemental composition of the original a n a l e and which were detected by a conventional quadrupole mass spectrometer. This technique is different from inductively coupled plasma mass spectrometry which attempts to directly sample the components in the plasma. CRIMS is different in that the inlet of the MS is some distance downstream from the microwave cavity. This allows time for reactive intermediates to form stable products, a process which is enabled by the presence of a reactant gas, otherwise pyrolysis and only partial conversion to volatile products would

+

0 1989 American Chemical Society