The Pesticide Chemist and Modern Toxicology - American Chemical

variability in a set of data (i.e. the signal to noise ratio is small), larger sample sizes are required if the experiment is to have a reasonable deg...
0 downloads 0 Views 2MB Size
23 Statistical Considerations in the Evaluation of Toxicological Samples JAMES J. TIEDE Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

Bristol Laboratories, Syracuse, NY 13201

"You can prove anything with statistics." Few people, if any, have not heard this statement. Statistics i s the science which deals with the collection, evaluation, interpretation and presentation of experimental data. As a science, i t i s governed by a set of fundamental underlying assumptions which, i f violated, can invalidate the results of a statistical analysis. Thus, while the opening statement of this paper i s not correct, a relatively accurate statement can be made after a slight modification. "With an improper analysis, you can prove anything with statistics." This i s certainly true. If one ignores the assumptions underlying a statistical method of analysis or employs an improper method of analysis for the experimental design, any desired conclusion can be obtained. However, i s i t not true of any scientific discipline that if the fundamental rules are violated, questionable results can be obtained? The same i s true of statistics. An appropriate statistical treatment of experimental data, one which w i l l withstand c r i t i c a l peer review will, in general, lead to unequivocal results. The o b j e c t i v e o f a s t a t i s t i c a l e v a l u a t i o n o f experimental data i s t o provide r e s u l t s which are meaningful t o the experimentor. The most r i g o r o u s a n a l y s i s may have l e s s value than a simple graph if it does not aid the experimentor in the i n t e r p r e t a t i o n o f h i s d a t a . On the other hand, a simple a n a l y s i s may prove t o be misleading if unjustified assumptions about the data are made. Even the most a p p r o p r i a t e a n a l y s i s will not guarantee that the d e s i r e d c o n c l u s i o n will be o b t a i n e d . "It's obvious that there i s a d i f f e r e n c e i n t h i s d a t a . Why d o n ' t the statistics prove it?" This is a comment which has been heard by a l l c o n s u l t i n g statisticians. There are numerous factors influencing a statistical evaluation. Among these a r e : 0097-6156/81/0160-0387$05.75/0 © 1981 American Chemical Society

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

388

THE

PESTICIDE

CHEMIST

A N D

M O D E R N

TOXICOLOGY

inadequate sample s i z e , biased sample, improper design and u n c o n t r o l l e d exogenous v a r i a b l e s . I f one or more o f these are not addressed a p p r o p r i a t e l y , the r e s u l t s of the s t a t i s t i c a l e v a l u a t i o n may not n e c e s s a r i l y agree with the b i o l o g i c a l or p h y s i c a l i n t e r p r e t a t i o n of the d a t a . I n s u f f i c i e n t sample s i z e i s a p a r t i c u l a r l y important f a c t o r . When there i s a great deal of v a r i a b i l i t y i n a set of data ( i . e . the s i g n a l to noise r a t i o i s s m a l l ) , l a r g e r sample s i z e s are r e q u i r e d i f the experiment i s to have a reasonable degree of s e n s i t i v i t y (the a b i l i t y to detect d i f f e r e n c e s among groups or between a sample and a reference standard) a s s o c i a t e d with i t . The problem here i s that samples are o f t e n expensive, both i n terms of d o l l a r s and time. A l l of these factors (sensitivity, sample s i z e and c o s t ) must be considered before an experiment i s conducted. This r e q u i r e s , however, that the s t a t i s t i c i a n be consulted before the study i s even designed and not j u s t a f t e r the data i s c o l l e c t e d . Constant interplay (interface) between the statistician and the experimentor, from the beginning of a p r o j e c t u n t i l the end, w i l l optimize the amount and q u a l i t y of i n f o r m a t i o n which can be obtained. "How can these data show s i g n i f i c a n c e ? I t s obvious that there are no r e a l d i f f e r e n c e s h e r e . " This i s another frequent plea made to s t a t i s t i c i a n s . This touches on a difficult problem; s t a t i s t i c a l s i g n i f i c a n c e versus c l i n i c a l , b i o l o g i c a l or physical significance. Often, ones i n t u i t i o n or experience w i l l suggest that data which show s t a t i s t i c a l s i g n i f i c a n c e may not be biologically significant. As s t a t e d p r e v i o u s l y , the causes of such d i f f e r e n c e s ( c o n t r a d i c t i o n s ) are numerous; improper design, u n c o n t r o l l e d v a r i a b l e s i n the experiment, a sample which i s not r e p r e s e n t a t i v e of the p o p u l a t i o n at l a r g e are but a few. It i s the e x i s t e n c e of t h i s s t a t i s t i c a l - b i o l o g i c a l c o n t r a d i c t i o n which underscores the need for constant interaction between experimentor and s t a t i s t i c i a n . With the continued i n t e r p l a y between statistician and biologist, the potential for c o n t r a d i c t i o n can be minimized. S t a t i s t i c s i s a t o o l f o r s c i e n t i s t s j u s t as the brush i s a t o o l f o r the p a i n t e r . When used p r o p e r l y i n the hands of an a r t i s t , the paintbrush can help transform a blank p i e c e of canvas i n t o a masterpiece. When improperly used, i t can destroy t h a t same piece of canvas. The same i s t r u e o f s t a t i s t i c s . When properly employed by a p r o f e s s i o n a l , the " p i c t u r e " which the data conveys can be e x t r a c t e d . When improperly a p p l i e d , questionable r e s u l t s can be expected. The purpose of t h i s paper i s to present and d i s c u s s some of the more commonly used s t a t i s t i c a l methods. The emphasis w i l l be on understanding the concepts behind the methods and on i n t e r p r e t a t i o n of the r e s u l t s of the analyses. Since t h i s i s to be an overview of the methods presented, d e t a i l e d d i s c u s s i o n w i l l not be p o s s i b l e . Relevant references w i l l be i n c l u d e d f o r f u r t h e r d e t a i l s on the concepts presented.

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

23.

TIÈDE

Evaluation

of Toxicological

Samples

389

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

S t a t i s t i c s Based on a Sample A s t a t i s t i c a l p o p u l a t i o n i s the t o t a l c o l l e c t i o n o f a l l p o s s i b l e values o f the a t t r i b u t e w i t h which one i s concerned. For example, the blood pressure of American a d u l t males or the body weight of Fisher rats are s t a t i s t i c a l populations. G e n e r a l l y , the p o p u l a t i o n i s so l a r g e , i t i s impossible t o d i r e c t l y access the e f f e c t that a chemical agent (food a d d i t i v e , drug, e t c ) would have on the p o p u l a t i o n . To administer the chemical agent t o each member o f the p o p u l a t i o n and determine the e f f e c t , i f any, would be i m p o s s i b l e . As an a l t e r n a t i v e , the e f f e c t o f the chemical agent on a s m a l l p o r t i o n o r sample from the p o p u l a t i o n can be evaluated and from t h i s e v a l u a t i o n , i n f e r e n c e s can be made about the p o p u l a t i o n a t l a r g e . I f the sample i s c h a r a c t e r i s t i c o f the p o p u l a t i o n , i t should provide good i n s i g h t i n t o the nature o f the p o p u l a t i o n . One o f the f u n c t i o n s of s t a t i s t i c s i s t o make o b j e c t i v e i n f e r e n c e s about the p o p u l a t i o n response based on the data obtained from the sample. Confidence Intervals. In this section, some o f the fundamental s t a t i s t i c a l concepts which r e l a t e t o data c o l l e c t e d from a sample w i l l be d i s c u s s e d . Although only the one sample problem w i l l be discussed i n d e t a i l , many o f the concepts can be extended t o cases where there are more than one sample. Consider the f o l l o w i n g ( h y p o t h e t i c a l ) body weight d a t a : Change i n 15.5 17.8 17.2 21.3 24.6 14.1 18.8 8.6

body weights(g) 21.9 20.6 13.9 29.A 13.3 19.2 15.9 7.7 18.3 22.0 24.1 17.5

I t i s assumed t h a t these data represent a random sample from the p o p u l a t i o n , t h a t i s , a sample which was c o l l e c t e d i n such a manner t h a t each member o f the p o p u l a t i o n had an equal chance o f being chosen. A question which immediately a r i s e s i s what i n f e r e n c e s , based on the sample d a t a , can be made about the mean and standard d e v i a t i o n of the p o p u l a t i o n . The most common method of e s t i m a t i n g the p o p u l a t i o n mean i s t o use the average o f the sample date, i . e . the sample mean, x. The standard d e v i a t i o n i s most commonly estimated by the sample standard d e v i a t i o n s =jjT 2 (xi-x) /(n-l)J'/ where xi refers to the i n d i v i d u a l observations and η i s the number o f observations i n the sample. Assuming that the data come from a normal (Gaussian or b e l l - s h a p e d ) d i s t r i b u t i o n , the sample mean (x) and the sample standard d e v i a t i o n ( s ) , i n a d d i t i o n t o the obvious advantages i n terms of familiarity and i n t e r p r e t a b i l i t y , have optimal s t a t i s t i c a l properties. Both of these s t a t i s t i c s (x and s) are c a l l e d p o i n t estimates because they provide a s i n g l e number estimate of the p o p u l a t i o n parameter o f i n t e r e s t . For the sample d a t a , x=18.1 and s=5.24. 2

2

=1

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

390

THE

PESTICIDE

CHEMIST

A N D

M O D E R N

TOXICOLOGY

Consider, f o r the moment, the sample mean. Although χ provides the " b e s t " estimate o f the p o p u l a t i o n mean, i t i s based on the sample and hence, i t i s not e x a c t . I f another sample i s taken from the same p o p u l a t i o n , i t i s l i k e l y t h a t a n u m e r i c a l l y d i f f e r e n t estimate of the p o p u l a t i o n mean would r e s u l t . A t h i r d sample would y i e l d a t h i r d estimate o f the mean. Thus, w h i l e χ provides a point estimate of the p o p u l a t i o n mean, i t would a l s o be of value to have an i n t e r v a l which, with a s p e c i f i e d degree of assurance or confidence, would c o n t a i n the p o p u l a t i o n mean. A confidence i n t e r v a l provides such an e s t i m a t e . For a s p e c i f i e d degree or l e v e l of confidence Ρ ( f o r example 90%), a one or two sided confidence i n t e r v a l f o r the p o p u l a t i o n mean can be constructed from the sample d a t a . The value of Ρ can be a l t e r e d depending on the d e s i r e d l e v e l of c o n f i d e n c e . The l a r g e r the value of P, t h a t i s , the g r e a t e r the degree of confidence t h a t i s d e s i r e d , the wider the corresponding i n t e r v a l w i l l be. For the example d a t a , the 95% (2-sided) confidence l i m i t s f o r the p o p u l a t i o n mean are (15.6,20.5). When i n t e r p r e t i n g these l i m i t s , i t i s not proper to say t h a t there i s a 95% p r o b a b i l i t y t h a t the p o p u l a t i o n mean i s i n the i n t e r v a l (15.6,20.5). The mean e i t h e r i s ( p r o b a b i l i t y = 1) or i s not ( p r o b a b i l i t y = 0) w i t h i n the confidence i n t e r v a l . The c o r r e c t statement i s that one can be 95% sure t h a t the p o p u l a t i o n mean does l i e between 15.6 and 20.5. That i s , the p r o b a b i l i t y t h a t the i n t e r v a l (15.6,20.5) c o n t a i n s the p o p u l a t i o n mean i s 95%. Another way of i n t e r p r e t i n g the confidence i n t e r v a l i s as follows. Suppose that 100 random samples were taken from a single population and that 100 confidence intervals were computed. I t could be expected t h a t 95% of the 100 confidence i n t e r v a l s would encompass the p o p u l a t i o n mean. Confidence i n t e r v a l s serve another u s e f u l purpose. They can a s s i s t one i n determining whether the p o p u l a t i o n mean may equal a s p e c i f i c value. To i l l u s t r a t e , suppose p r i o r t o the c o l l e c t i o n of the example d a t a , one wished t o determine whether the p o p u l a t i o n mean might be equal to 25. Since 25 does not f a l l w i t h i n the confidence i n t e r v a l f o r the p o p u l a t i o n mean, one could f e e l reasonably c o n f i d e n t t h a t , based on the d a t a , the p o p u l a t i o n mean d i d not equal 25. To i l l u s t r a t e the a p p l i c a t i o n of t h i s concept to the two-sample problem, consider the comparison of two p o p u l a t i o n means. For example, suppose a chemical agent was added t o the feed of one group of mice while a second group had a chemical f r e e d i e t . Suppose f u r t h e r t h a t one wished to assess the e f f e c t , i f any, of the chemical on body weight. Based on the experimental data, one could c o n s t r u c t a confidence interval for the d i f f e r e n c e of the two mean values. By v i r t u e o f the d i s c u s s i o n above, i f t h i s confidence i n t e r v a l contained the value 0, one could conclude t h a t there was no d i f f e r e n c e i n the mean v a l u e s .

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

23.

TIÈDE

Evaluation

of Toxicological

Samples

391

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

While only confidence i n t e r v a l s f o r the p o p u l a t i o n mean have been d i s c u s s e d , confidence i n t e r v a l s can be computed f o r the p o p u l a t i o n standard d e v i a t i o n . In a d d i t i o n , as described i n the l a s t example, confidence i n t e r v a l s can be a p p l i e d t o more than one sample. The c o n s t r u c t i o n o f these i n t e r v a l s i s discussed i n most s t a t i s t i c a l textbooks i n c l u d i n g those c i t e d a t the end o f t h i s paper. Tolerance I n t e r v a l s . In some s i t u a t i o n s , given a sample from a p o p u l a t i o n , one i s i n t e r e s t e d i n c o n s t r u c t i n g l i m i t s not on the mean o r standard d e v i a t i o n , but l i m i t s wich w i l l provide an i d e a o f a range w i t h i n which a c e r t a i n percentage o f the p o p u l a t i o n will fall. Such l i m i t s are c a l l e d t o l e r a n c e l i m i t s . For example, one may wish t o determine the t o l e r a n c e l i m i t s which c o n t a i n 95% o f the p o p u l a t i o n . If the p o p u l a t i o n c h a r a c t e r i s t i c s (mean and s t a t i s t i c a l deviation) are known, tolerance limits can be p r e c i s e l y determined. However, since only the sample c h a r a c t e r i s t i c s are g e n e r a l l y a v a i l a b l e and, as p r e v i o u s l y d i s c u s s e d , these a r e not e x a c t , t o l e r a n c e l i m i t s can be determined only wthin a c e r t a i n degree o f confidence. For example, one could determine the l i m i t s which w i t h 90% confidence, w i l l c o n t a i n 95% o f the population. To c a l c u l a t e t o l e r a n c e l i m i t s , two values must f i r s t be s p e c i f i e d ; C the p r o p o r t i o n o f the p o p u l a t i o n t o be covered (the "coverage") and Ρ the confidence c o e f f i c i e n t . For given values o f C and P, one or two sided t o l e r a n c e l i m i t s f o r the p o p u l a t i o n can be c a l c u l a t e d . Using the data from the d i s c u s s i o n above, suppose one wished to o b t a i n a 2 sided t o l e r a n c e i n t e r v a l w i t h C = 75% and Ρ = 95%. The a p p r o p r i a t e i n t e r v a l would be (9.7,26.5). Based on t h i s c a l c u l a t i o n , one i s 95% sure that 75% o f the p o p u l a t i o n f a l l s between 9.7 and 2 6 . 5 . A more d e t a i l e d d e s c r i p t i o n o f the c a l c u l a t i o n o f t o l e r a n c e i n t e r v a l s can be found i n 3, 7 and 8. R e l a t i o n s h i p s Among V a r i a b l e s The methods which have been presented t o t h i s p o i n t a r e used i n the e v a l u a t i o n o f one or more samples. In many i n s t a n c e s , however, one wishes to evaluate the r e l a t i o n s h i p between two or more v a r i a b l e s . The r e l a t i o n s h i p between dose o f a drug i n the d i e t and body weight of mice, the plasma l e v e l s o f a drug as a f u n c t i o n o f time o r a comparison o f a new a n a l y t i c a l method compares t o a standard method are examples o f such problems. In t h i s s e c t i o n , d i s c u s s i o n w i l l focus on the case o f two v a r i a b l e s X and Y. I t w i l l be i n i t i a l l y assumed t h a t the v a r i a b l e X (the independent v a r i a b l e ) i s measured w i t h l i t t l e o r no e r r o r and i s set p r i o r t o the time when the experiment i s conducted. The second v a r i a b l e , Y, (the dependent v a r i a b l e ) i s the response v a r i a b l e which i s dependent on the value o f X. Y i s measured with e r r o r . The e v a l u a t i o n o f the r e l a t i o n s h i p between X and Y

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

392

THE

PESTICIDE

C H E M I S T

A N D

M O D E R N

TOXICOLOGY

i s termed a r e g r e s s i o n problem. Although the concepts presented here apply s p e c i f i c a l l y t o the l i n e a r r e g r e s s i o n model, they can be extended to the cases of polynomial r e g r e s s i o n , regression with many X v a r i a b l e s (multiple regression) and nonlinear r e g r e s s i o n models. Least Squares A n a l y s i s . In the case o f l i n e a r r e g r e s s i o n , the t h e o r e t i c a l r e l a t i o n s h i p between the two v a r i a b l e s X and Y can be expressed as y = A + Bx + e r r o r where A i s the t h e o r e t i c a l (population) i n t e r c e p t and Β i s the t h e o r e t i c a l s l o p e . Given a sample of η independent pairs (xi,yi), (x2>Y2)> ···> ( η>Υη)> the observed relationship between χ and y is expressed as y = a+bx where a i s an estimate of A and ' b is an estimate of B. Numerous methods e x i s t f o r e s t i m a t i n g A and B. The most common approach i s the l e a s t squares method. The l e a s t squares method i s based on minimizing the square of the d i s t a n c e between the observed value y b s a the " f i t t e d " value y f i t = a+bx. This d i s t a n c e i s represented by the i n t e r v a l d ' i n F i g u r e 1. Thus, the l e a s t squares method i s based on minimizing ΣΙ d =£(y bs-yfit) Based on the assumptions t h a t the data ( y s ) have a normal d i s t r i b u t i o n and that the standard d e v i a t i o n of the y ' s i s the same as each x, the l e a s t squares estimates ' a and ' b ' of A and Β are unique and unbiased. I t i s these p r o p e r t i e s which make the l e a s t squares estimates of A and Β a t t r a c t i v e . Since ' a ' and b are s t a t i s t i c a l estimates based on a sample, they have an e r r o r term (the standard e r r o r o f the estimate) a s s o c i a t e d with them. Among a l l estimates A and B, the l e a s t squares estimates have the s m a l l e s t e r r o r term. This i s the t h i r d important property of the l e a s t squares e s t i m a t e s . Consider the ( h y p o t h e t i c a l ) data presented below.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

χ

f

f

f

n d

0

f

2

2

0

f

1

f

Table 1

f

X 0 5 10 20 30

Y 27.3 AO.5 63.1 91.6 117.7

I n s p e c t i o n of a p l o t of Y versus X (Figure 1) suggests t h a t a s t r a i g h t l i n e might be a reasonable f i t to the d a t a . Using the method of l e a s t squares, the estimates of A and Β are found to be a=28.3 and b=3.06. This means t h a t , based on the d a t a , the r e l a t i o n s h i p between X and Y can be expressed as y = 28.3 + 3.06x. Prediction. The e s t i m a t i o n of the parameters o f a r e g r e s s i o n l i n e i s o f t e n the f i r s t step i n an a n a l y s i s . F r e q u e n t l y , the r e g r e s s i o n l i n e i s used to p r e d i c t a value of Y f o r a new value of X. (The opposite problem, p r e d i c t an X f o r a new Y w i l l be discussed l a t e r . ) When the new value x * f a l l s between the X s used t o estimate the r e g r e s s i o n l i n e (the r e g r e s s i o n l i m i t s ) , the f

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

TIÈDE

Evaluation

of Toxicological

Figure 1.

Samples

Sample data for regression analysis

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

394

THE

PESTICIDE

CHEMIST

A N D

M O D E R N

TOXICOLOGY

prediction of the corresponding Y value, y*, is called interpolation. When x * f a l l s beyond the r e g r e s s i o n l i m i t s , the p r e d i c t i o n of Y i s termed e x t r a p o l a t i o n . Consider the problem of interpolation. Given a new observation x * , both p o i n t and i n t e r v a l estimates of Y can be c a l c u l a t e d . The p o i n t estimate of Y i s c a l l e d the p r e d i c t e d value and i s e a s i l y c a l c u l a t e d as y * = a+bx*. The i n t e r v a l estimate f o r the p r e d i c t e d value i s c a l l e d the p r e d i c t i o n i n t e r v a l . The c a l c u l a t i o n of a p r e d i c t i o n i n t e r v a l i s e s s e n t i a l l y the same as the c a l c u l a t i o n o f a confidence i n t e r v a l . Both r e q u i r e the s p e c i f i c a t i o n of the confidence c o e f f i c i e n t Ρ and both (overlaps) i n t e r v a l s have the i n t e r p r e t a t i o n ; the i n t e r v a l c o n t a i n s the t r u e (population) value w i t h p r o b a b i l i t y P. There are two types of p r e d i c t i o n i n t e r v a l s which can be constructed i n the r e g r e s s i o n problem, p r e d i c t i o n i n t e r v a l s f o r a p o p u l a t i o n mean (the mean response y * f o r a given x * ) and prediction intervals f o r i n d i v i d u a l observations (i.e. the p r e d i c t i o n i n t e r v a l f o r a p a r t i c u l a r p a t i e n t ) . C o n c e p t u a l l y , the d i f f e r e n c e between the two i n t e r v a l s i s s u b t l e . In the f i r s t case, one i s i n t e r e s t e d i n an i n t e r v a l f o r the p o p u l a t i o n mean at a given value of X. In the second case, an i n d i v i d u a l o b s e r v a t i o n from the p o p u l a t i o n i s o f i n t e r e s t . In p r a c t i c e the difference between the two can be s u b s t a n t i a l since the p r e d i c t i o n i n t e r v a l f o r the p o p u l a t i o n mean i s more narrow than that f o r an i n d i v i d u a l . To i l l u s t r a t e t h i s d i f f e r e n c e , c o n s i d e r the data i n Table 1. The 95% p r e d i c t i o n i n t e r v a l f o r the (population) mean response at x*= 15 i s (74.0,84.4). If, however, one were i n t e r e s t e d i n o b t a i n i n g a p r e d i c t i o n f o r a p a r t i c u l a r p a t i e n t who had x*=15, t h i s would be (66.7,91.7). E x t r a p o l a t i o n , the p r e d i c t i o n o f values beyond the range of the independent v a r i a b l e , i s h i g h l y dependent on an important assumption. I t i s assumed t h a t the r e l a t i o n s h i p between the two v a r i a b l e s remains contant f o r a l l values o f X, up t o and i n c l u d i n g the value o f i n t e r s t . G. Hahn (6) presents an excellent example of the potential danger involved with extrapolation. Consider the data i n Figure 2 . Inspection of the p l o t of X versus Y suggests a l i n e a r r e l a t i o n s h i p between the two v a r i a b l e s . F i t t i n g a s t r a i g h t l i n e t o the data (Figure 3) and e x t r a p o l a t i n g out t o x*=50 produces a p r e d i c t e d value of y*=138. The p l o t i n Figure 2 i s a h y p o t h e t i c a l p l o t o f the height of a random sample of males between the ages of 8 and 14. The e x t r a p o l a t i o n of x*=50 suggests t h a t a 50 year o l d male would have a height of 11+ f e e t . The assumption t h a t the l i n e a r r e l a t i o n s h i p between X and Y w i l l continue t o hold f o r x*=50 i s obviously erroneous. Thus when one e x t r a p o l a t e s beyond the r e g r e s s i o n l i m i t s , one should do so very c a u t i o u s l y , e s p e c i a l l y as one get f u r t h e r away from the r e g r e s s i o n endpoints. The concepts of p o i n t and i n t e r v a l p r e d i c t i o n estimates as discussed i n the case o f the i n t e r p o l a t i o n can a l s o be a p p l i e d t o the e x t r a p o l a t i o n problem.

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

TIÈDE

Evaluation

of Toxicological

Samples

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

23.

Figure 2.

Linear relationship between variables X and Y

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

395

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

396

THE

Figure 3.

PESTICIDE

CHEMIST

AND MODERN

TOXICOLOGY

Extrapolation of data in Figure 2 to χ = 50. The variable X is age in years and the variable Y is height in inches.

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

23.

TIÈDE

Evaluation

of Toxicological

Samples

397

Tests f o r goodness of f i t . While a l i n e a r model can be f i t t o any s e t o f d a t a , a s t r a i g h t l i n e may not be the best model. I t i s p o s s i b l e i n the r e g r e s s i o n a n a l y s i s t o check f o r l a c k o f f i t , t h a t i s , the i n a b i l i t y o f the model t o adequately d e s c r i b e the r e l a t i o n s h i p between X and Y. This s t a t i s t i c a l t e s t r e q u i r e s t h a t two or more independent observations (measurements) must be made a t each l e v e l o f X. Independent here means t h a t unique responses must be obtained. Determining the white blood c e l l count from the same sample three times does not provide three independent o b s e r v a t i o n s . Although the t e s t f o r l a c k o f f i t cannot i n d i c a t e what the a p p r o p r i a t e model would be, i t can enable the experimentor t o assess the v a l i d i t y o f the assumed model. This i s why i t i s f r e q u e n t l y requested by s t a t i s t i c i a n s t h a t m u l t i p l e observations be obtained f o r the v a r i o u s l e v e l s o f the X v a r i a b l e . Logrithmic Transformations. I t i s common i n the b i o l o g i c a l sciences t o f i n d t h a t w h i l e the r e l a t i o n s h i p between X and Y i s not l i n e a r , a l o g r i t h m i c t r a n s f o r m a t i o n o f one o r both o f the variables w i l l produce a s t r a i g h t line relationship. The r e g r e s s i o n methods as described above can be a p p l i e d t o the transformed data i n order t o estimate the parameters o f the model, t o make p r e d i c t i o n s o f f u t u r e values and t o o b t a i n the corresponding confidence l i m i t s . Caution must be used, however, when i n t e r p r e t i n g the r e s u l t s of analyses based on transformed d a t a , p a r t i c u l a r l y when d i s c u s s i n g the r e s u l t s i n the o r i g i n a l scale o f measurement. A l l results (parameter estimates, confidence l i m i t s , e t c . ) p e r t a i n t o the transformed d a t a . Expressing the r e s u l t s o f the s t a t i s t i c a l e v a l u a t i o n i n the o r i g i n a l s c a l e o f measurement ( i . e . by t a k i n g a n t i - l o g r i t h m s ) does not preserve the s t a t i s t i c a l i n t e r p r e t a t i o n . For example, the 95% confidence l i m i t s f o r a p r e d i c t e d value o f l o g Y a r e n o t , a f t e r t a k i n g a n t i - l o g s , the 95% confidence l i m i t s f o r the value of Y i n the o r i g i n a l s c a l e o f measurement. This f a c t can be i l l u s t r a t e d using the f o l l o w i n g s e t o f d a t a : 15, 17, 10, 22, 13, 15, 18. The mean and 95% confidence l i m i t s f o r these data are 15.71 and (12.18,19.24), r e s p e c t i v e l y . Taking the l o g r i t h m o f each o f the values, the mean and 95% confidence l i m i t s are 2.73 and (2.50,2.96). Taking a n t i - l o g s o f these v a l u e s , one o b t a i n s t h a t the mean and 95% confidence l i m i t s i n the o r i g i n a l s c a l e o f measurement are 15.31 and (12.14,19.30), respectively. Comparison w i t h the f i r s t s e t o f s t a t i s t i c s r e v e a l s d i s t i n c t differences. The d i s c u s s i o n above a p p l i e s t o most other t r a n s f o r m a t i o n s which are used t o l i n e a r i z e a s e t o f d a t a , i . e . e x p o n e n t i a t i o n , t a k i n g r o o t s , r a i s i n g t o a power, p r o b i t s , l o g i t s , e t c . Only f o r those transformations which themselves are l i n e a r (that i s , are of the form y=b+mx) w i l l the s t a t i s t i c a l i n t e r p r e t a t i o n s be preserved before and a f t e r t r a n s f o r m a t i o n .

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

398

THE

PESTICIDE

CHEMIST

A N D

M O D E R N

TOXICOLOGY

S t a t i s t i c a l C a l i b r a t i o n . The d i s c u s s i o n , t o t h i s p o i n t , has d e a l t w i t h the p r e d i c t i o n of y * f o r a given x * . In many problems such as radioimmunoassay, one wishes t o p r e d i c t X from a given (observed) value of Y. Such problems are c a l l e d inverse p r e d i c t i o n or c a l i b r a t i o n problems. A general o u t l i n e of the c a l i b r a t i o n problem can be described as f o l l o w s . Two v a r i a b l e s , X (the independent v a r i a b l e ) and Y (the dependent v a r i a b l e ) , are such t h a t X i s d i f f i c u l t (or i m p o s s i b l e ) t o measure d i r e c t l y while Y i s r e l a t i v e l y easy to measure. In the f i r s t p a r t of the experiment, the corresponding Y ' s are obtained f o r η known values of X. These data are f r e q u e n t l y c a l l e d the c a l i b r a t i o n or standards d a t a . L a t e r , m a d d i t i o n a l values of Y are ( i . e . responses from m p a t i e n t s ) obtained and the o b j e c t i v e i s to estimate the corresponding values f o r the X ' s . The f i r s t step i n the c a l i b r a t i o n a n a l y s i s i s to determine the r e l a t i o n s h i p between X and Y by f i t t i n g a model t o the c a l i b r a t i o n data using the method of l e a s t squares . When e s t i m a t i n g the parameters o f the c a l i b r a t i o n l i n e , i t i s not c o r r e c t to reverse the r o l e of X and Y and then to use the procedure f o r p r e d i c t i o n i n the r e g r e s s i o n a n a l y s i s . The theory of l e a s t squares i s based on the assumption t h a t the X ' s are e r r o r free and that the Y ' s are measured with e r r o r . To regress X on Y v i o l a t e s t h i s fundamental assumption. The c o r r e c t procedure f o r the c a l i b r a t i o n problem i s t o regress Y on X as i n the r e g r e s s i o n problem i n order to estimate the c a l i b r a t i o n line. P r e d i c t e d values f o r X can be obtained as f o l l o w s . The point estimate of Χ ( χ ' ) f o r a new value y ' i s e a s i l y c a l c u l a t e d as x ' = ( y ' - a ) / b . The c a l c u l a t i o n of an i n t e r v a l estimated f o r X i s more d i f f i c u l t than c a l c u l a t i o n of p r e d i c t i o n l i m i t s f o r y * i n the r e g r e s s i o n problem. In the c a l i b r a t i o n problem, there are 2 e r r o r terms which must be c o n s i d e r e d ; the error i n establishing the c a l i b r a t i o n l i n e and the error a s s o c i a t e d w i t h measuring y . Since the c a l i b r a t i o n l i n e i s based on a random sample of o b s e r v a t i o n s , i t i s not e x a c t . A d i f f e r e n t set of x ' s (at the same l e v e l s as before) would have r e s u l t e d i n a numerically d i f f e r e n t estimate of A and B. This lack of p r e c i s i o n must be accounted f o r i n the c a l c u l a t i o n of the i n t e r v a l estimate f o r χ ' . In the same way, i f a second sample were taken from the same p a t i e n t , two n u m e r i c a l l y d i f f e r e n t Y ' s would l i k e l y r e s u l t . Thus, t h i s e r r o r or l a c k o f p r e c i s i o n must a l s o be considered i n the c a l c u l a t i o n of the i n t e r v a l estimate for χ ' . The procedure f o r i n c o r p o r a t i n g these two e r r o r terms i n the c a l c u l a t i o n of the i n t e r v a l estimate f o r x ' i s i l l u s t r a t e d i n Figure 4 . Although such i n t e r v a l s may appear to be r e l a t i v e l y l a r g e , i t would be i n a p p r o p r i a t e to ignore one of the e r r o r terms i n order to reduce the i n t e r v a l width. To i l l u s t r a t e these concepts, suppose one wished to o b t a i n p o i n t and i n t e r v a l estimates f o r X when y=50 i n the data i n F i g u r e 1. The p o i n t estimate of X i s c a l c u l a t e d to be 7.1. Using the procedure o u t l i n e d above, the i n t e r v a l estimate f o r X i s found to be (2.74,11.18). 1

1

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

23.

TIÈDE

Evaluation

of Toxicological

Samples

399

E r r o r s i n Both V a r i a b l e s . U n t i l now i n the d i s c u s s i o n o f the r e l a t i o n s h i p between two v a r i a b l e s , i t has been assumed t h a t the X v a r i a b l e i s measured without e r r o r o r w i t h n e g l i g a b l e e r r o r . I f X i s measured with non-negligable e r r o r , one s o l u t i o n t o the problem of e v a l u a t i n g the r e l a t i o n s h i p between X and Y i s c a l l e d a correlation analysis. Given two v a r i a b l e s X and Y, a measure of the l i n e a r association between them i s given by the c o r r e l a t i o n c o e f f i c i e n t r . For a given problem the value o f r can f a l l anywhere i n the i n t e r v a l (-1, 1 ) . A value o f r = - l i s r e f l e c t i v e o f a " p e r f e c t " negative l i n e a r r e l a t i o n s h i p between X and Y(Figure 5 a ) . A value o f r=+l i s r e f l e c t i v e o f a p e r f e c t p o s i t i v e l i n e a r r e l a t i o n s h i p between X and Y(Figure 5 b ) . The absence o f a l i n e a r r e l a t i o n s h i p between X and Y(Figure 5c) i s suggested by r=0. I t i s important t o note t h a t t h i s does not mean t h a t X and Y are not r e l a t e d . To i l l u s t r a t e , c o n s i d e r the h y p o t h e t i c a l data i n Figure 5d. For t h i s d a t a , r would equal 0. However, i t i s r a t h e r apparent t h a t there i s a ( n o n l i n e a r ) r e l a t i o n s h i p between the two v a r i a b l e s . Thus, when determining the c o r r e l a t i o n c o e f f i c i e n t between two v a r i a b l e s , i t is important t o keep i n mind t h a t r i s p r o v i d i n g a measure o f the l i n e a r a s s o c i a t i o n between the two v a r i a b l e s . Suppose t h a t , i n the e v a l u a t i o n o f two v a r i a b l e s X and Y, both o f which are measured with non-negligable e r r o r , one wishes to determine the f u n c t i o n a l relationship between the two variables and not j u s t the c o r r e l a t i o n c o e f f i c i e n t . To i l l u s t r a t e , suppose the t h e o r e t i c a l r e l a t i o n s h i p between the dose D o f a drug and the response metameter R i s given by the model R=A+B*D+error and that one wished t o estimate the parameters o f the model. Suppose f u r t h e r that both D and R are measured w i t h e r r o r so t h a t what are a c t u a l l y observed are d=D+errori and r=R+error2. Since both D and R are measured with e r r o r , the regression methods previously described cannot be used t o estimate the parameters A and Β o f the model. The s o l u t i o n t o t h i s problem r e q u i r e s t h a t a d d i t i o n a l assumptions must be made about the d a t a . There are two p a r t i c u l a r approaches which have been proposed f o r addressing t h i s problem which has been f r e q u e n t l y r e f e r r e d t o as the e r r o r i n v a r i a b l e s problem. One i s t o assume t h a t the r a t i o o f the e r r o r i n D t o the e r r o r i n R i s constant, t h a t i s , to assume t h a t k=error(R)/error(D). Even though the experimentor does not know the exact values f o r the standared d e v i a t i o n s o f D and R, he may f e e l , f o r example, t h a t the e r r o r i n R i s o f the same order o f magnitude as that f o r D ( i . e . k = l ) . Or, the experimentor may be able t o o b t a i n a reasonable estimate o f k based on prebvious experience. With the determination o f k, estimates of A and Β can be obtained. The second approach i s c a l l e d the c o n t r o l l e d - i n d e p e n d e n t v a r i a b l e approach. In t h i s approach, the experimentor decides before the experiment, what values o f d w i l l be observed (hence the name c o n t r o l l e d - i n d e p e n d e n t - v a r i a b l e ) . For example, the experimentor may choose t o o b t a i n responses a t d=5, 10, and 20

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

THE

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

400

PESTICIDE

CHEMIST

A N DM O D E R N

TOXICOLOGY

1

Figure 4. Calculation of prediction intervals in calibration analysis: y is the observed value of Y ; x is the predicted value of X corresponding to y ; x and x are the lower and upper prediction limits, respectively. 1

1

2

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

4

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

23.

TIÈDE

Evaluation

\

Y

of Toxicological

401

Samples

/

\

\

Y x

/

\

/ X

X

b

a

X

/

X

Y

Y

/

\

X

X

C

d

Figure 5. Possible outcomes from correlation analysis: (a) r = 0, negative linear relationship; (b) r = 0* positive linear relationship; (c) r = 0, no linear relationship; (d) r == 0, X and Y are related but not linearly.

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

402

THE

PESTICIDE

CHEMIST

A N D

M O D E R N

TOXICOLOGY

mg. Although the t r u e dose D w i l l not equal 5, 10 or 20 mg, the f a c t t h a t responses are obtained at the observed doses(d)=5, 10 and 20 mg i s what i s important. Assuming the data are c o l l e c t e d under these conditions, estimates of A and Β can be subsequentially obtained. To summarize, when one wishes to evaluate the r e l a t i o n s h i p between two v a r i a b l e s , one o f which i s f i x e d (the independent v a r i a b l e ) and one which i s allowed to vary (the dependent v a r i a b l e ) , the a n a l y s i s i s termed a r e g r e s s i o n problem. A s p e c i a l type of r e g r e s s i o n problem i s c a l l e d c a l i b r a t i o n . In the c a l i b r a t i o n problem, one wishes to p r e d i c t ( f u t u r e ) values o f the independent variable for given values of the independent variable. I f , on the other hand, both v a r i a b l e s are measured with e r r o r , c o r r e l a t i o n a n a l y s i s and e r r o r i n v a r i a b l e s analyses are two approaches which one can use i n the e v a l u a t i o n o f the data. More e x t e n s i v e d e t a i l s of r e g r e s s i o n and c o r r e l a t i o n a n a l y s i s are found i n the references c i t e d at the end o f t h i s paper. The c a l i b r a t i o n problem i s discussed i n 2, 8, £ and 10. Data Smoothing The concepts which have been p r e v i o u l s y presented d e a l t s p e c i f i c a l l y with the a n a l y s i s of experimental d a t a . An e q u a l l y important aspect i n the e v a l u a t i o n of experimental data i s how the data are presented and, i n p a r t i c u l a r , the g r a p h i c a l d i s p l a y of d a t a . One of the purposes of graphing data i s to i l l u s t r a t e trends or c y c l e s which may e x i s t i n the d a t a . I f , however, the data are " n o i s y " ( i . e . there e x i s t l a r g e v a r i a t i o n s to to random or experimental e r r o r , i t i s o f t e n d i f f i c u l t t o observe the important trends or p a t t e r n s i n the d a t a . The e l i m i n a t i o n of the " n o i s e " from a graph i s c a l l e d data smoothing. There are many methods which are used f o r data smoothing. Three methods which are p a r t i c u l a r l y u s e f u l because of t h e i r of simplicity are; the method o f averages, the method o f medians and the method o f d i f f e r e n c e s . In the d i s c u s s i o n t o f o l l o w , these three methods w i l l be described and a p p l i e d to the data presented i n Figure 6a and Table 2. Smoothing by averages i n v o l v e s the r e p l a c i n g of an observed value by the average of t h a t value and surrounding o b s e r v a t i o n s . To maintain symmetry, an equal number of observations are g e n e r a l l y taken on e i t h e r s i d e of the value t o be smoothed. The c a l c u l a t i o n of the smoothed value can be based on a s s i g n i n g equal weights to a l l data i n the a v e r a g e d . e . the a r i t h m e t i c average) or by computing a weighted average of the d a t a . 1-2-1 smoothing, smoothing i n which the center value r e c e i v e s twice the weight of the o u t s i d e values i s an example of the use of a weighted average. F i g u r e 6b presents a smoothed p l o t o f the data i n Figure 6a i n which each o b s e r v a t i o n i s replaced by the a r i t h m e t i c average of the o b s e r v a t i o n and one value t o e i t h e r s i d e of i t . Thus, f o r example, the value -0.28 i s replaced by -0.76 (see Table 2 ) . I t should be noted t h a t i n the smoothed curve, 2 p o i n t s are " l o s t " , the f i r s t and the l a s t . This i s due t o the f a c t t h a t a three p o i n t smoothed value could not be obtained f o r these d a t a . Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

TIÈDE

Evaluation

of Toxicological

Samples

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

23.

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

403

404

PESTICIDE

CHEMIST

A N D

M O D E R N

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

THE

TOXICOLOGY

Ν

-· +

> Ο

in

ο

ο

ο

ο

I

ι

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

Evaluation

of Toxicological

Samples

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

TIÈDE

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

405

THE

PESTICIDE

CHEMIST

A N D

M O D E R N

TOXICOLOGY

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

406

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

23.

TIÈDE

Evaluation

of Toxicological

407

Samples

Table 2 Data Used to I l l u s t r a t e Methods o f Smoothing

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

Original Data

-0.82 -0.28 -1.17 -0.24 -0.12 0.76 0.81 -0.04 0.27 -0.46 -1.05 -1.41 -0.89 -1.22 -1.41 -0.55 -0.23 0.16 0.59 0.53 -0.42 -0.40 -0.70 -2.16 -2.10 -1.71 -1.93 -0.83 -0.92 -0.57 0.25 -0.46 -1.03 -1.07 -1.31 -2.06 -2.08

Smoothed by Averages

-0.76 -0.56 -0.51 0.13 0.48 0.51 0.35 -0.08 -0.41 -0.97 -1.11 -1.17 -1.17 -1.06 -0.73 -0.21 0.17 0.43 0.23 -0.10 -0.51 -1.09 -1.66 -1.99 -1.92 -1.49 -1.23 -0.77 -0.42 -0.26 -0.41 -0.85 -1.14 -1.48 -1.81

Smoothed by

Medians

-0.82 -0.28 -0.24 -0.12 0.76 0.76 0.27 -0.04 -0.46 -1.05 -1.05 -0.89 -1.22 -1.22 -0.55 -0.23 0.16 0.53 0.53 -0.40 -0.42 -0.70 -2.10 -2.10 -1.93 -1.71 -0.83 -0.83 -0.57 -0.46 -0.46 -1.03 -1.07 -1.31 -2.06

Smoothed by

12™ Differences

#



.

-0.06 -0.94 -0.24 -0.31 -0.11 -0.60 -0.22 0.54 -0.69 0.05 0.35 -0.76 -1.22 -0.49 -0.52 -0.27 -0.69 -0.73 -0.34 -0.99 -0.61 -0.67 -0.61 0.11 0.03

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

408

THE

PESTICIDE

C H E M I S T

A N D

M O D E R N

TOXICOLOGY

Smoothing by medians i s very s i m i l i a r to smoothing by averages. In t h i s method, an o b s e r v a t i o n i s replaced by the the median of i t s e l f and the 2 neighboring(adjacent) p o i n t s . (The number 2 i s a r b i t r a r y . Other v a l u e s , A or 6 f o r example could have been used. However, t h i s may cause too much smoothing and the o v e r a l l c h a r a c t e r o f the data may be l o s t . ) To i l l u s t r a t e , i n the example data, the value -0.28 i s replaced by -0.82, the median of -0.82, -0.28 and -1.17. The advantage o f smoothing by medians i s t h a t i f there i s an o c c a s i o n a l o u t l i e r o b s e r v a t i o n ( e x c e s s i v e l y l a r g e of s m a l l value) i n the d a t a , the smoothed p l o t w i l l not be e f f e c t e d by i t . When smoothing by averages, the e x i s t a n c e of o u t l i e r s w i l l s t i l l be apparent i n the smoothed plot. Figure 6c and Table 2 i l l u s t r a t e the e f f e c t of smoothing the data i n Figure 6a using the method of medians. Smoothing by d i f f e r e n c i n g i s g e n e r a l l y used when there are c y c l e s i n the data which might mask u n d e r l y i n g trends or when observations are dependent on previous v a l u e s . The d i u r n a l v a r i a t i o n s i n blood pressures i s an example o f a c y c l e dominating a set of d a t a . I f the blood pressures of a p a t i e n t t r e a t e d w i t h an a n t i h y p e r t e n s i v e drug were taken hourly f o r three days, the e f f e c t of the drug would probably not be evident i n a g r a p h i c a l p r e s e n t a t i o n of the data because of the d i u r n a l v a r i a t i o n s . The removal of t h i s c y c l e by d i f f e r e n c i n g would r e v e a l the o v e r a l l decreasing trend i n the d a t a . Consider f o r the moment, data which i s c o l l e c t e d h o u r l y . To o b t a i n the smoothed data using the method of d i f f e r e n c i n g , each o b s e r v a t i o n i s replaced by the d i f f e r e n c e of i t s e l f and the o b s e r v a t i o n obtained χ hours p r e v i o u s l y . F i r s t order d i f f e r e n c e s involve obtaining the difference between the "current" observation and the previous observation; second order differencing involves the d i f f e r e n c e between the "current" o b s e r v a t i o n and the observations 2 hours p r e v i o u s l y , e t c . The l e v e l of d i f f e r e n c i n g w i l l depend and the p e r i o d of the c y c l e . For example, i f a set of data has a s i x hour c y c l e and the data are c o l l e c t e d hourly, sixth order differences would be appropriate. In Figures 6d, the data i n F i g u r e 6a are smoothed by t a k i n g t w e l f t h order d i f f e r e n c e s . These " d e - c y c l e d " data can now be examined/evaluated f o r the e x i s t a n c e of t r e n d s . Note t h a t the f i r s t 12 observations are " l o s t " i n the smoothing. Averages, medians and d i f f e r e n c e s are but three methods which can be used f o r data smoothing. The advantage of these methods over other methods such as e x p o n e n t i a l smoothing i s t h a t these methods are e a s i l y a p p l i e d t o most s e t s o f d a t a .

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.

23.

TIÈDE

Evaluation

of

Toxicological

Samples

409

Downloaded by NATL UNIV OF SINGAPORE on May 6, 2018 | https://pubs.acs.org Publication Date: August 10, 1981 | doi: 10.1021/bk-1981-0160.ch023

Summary To summarize, a number of s t a t i s t i c a l methods have been b r i e f l y presented i n order t o e s t a b l i s h a conceptual framework f o r the reader. The d i s c u s s i o n s have not been d e t a i l e d s i n c e such an i n depth accounting of each concept would not serve our purposes here. D e t a i l e d d i s c u s s i o n s o f each s t a t i s t i c a l method are a v a i l a b l e and references have been c i t e d f o r those who seek such depth. S t a t i s t i c s i s a very v i a b l e t o o l i n modern s c i e n t i f i c r e s e a r c h . With continued i n t e r f a c e between the s c i e n t i s t and the s t a t i s t i c i a n , the r e s u l t i n g research and only be enhanced.

Literature Cited 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.

Bennett, Carl A. and Franklin, Norman L. Statistical Analysis in Chemistry and the Chemical Industry. 1954. John Wiley and Sons, Inc. New York. Box, G. E. P., Hunter, W. G. and Hunter, J. S. Statistics for Experimentors. 1978. John Wiley and Sons, Inc. New York. Dixon, Wilfred J. and Massey, Richard H. Introduction to Statistical Analysis. Third Edition. 1975 McGraw-Hill Book Company. New York. Draper, N. R. and Smith, H. Applied Regression Analysis. 1966 John Wiley and Sons, Inc. New York. Graybill, F. A. An Introduction to Linear Statistical Models. Volume I. 1961. Mc Graw-Hill Book Company, Inc. New York. Hahn, Gerald, J. The Hazards of Extrapolation. 1978. Chemical Technology. 8. pp 699-701. Natrella, Mary G. Experimental Statistics. National Bureau of Standards. Handbook 91. 1966. U.S. Government Printing Office Washington D.C. Ostle, Bernard and Mensing, Richard W. Statistics in Research. Third Edition. 1975. The Iowa State University Press. Ames. Snedacor, George W. and Cochran, William G. Statistical Methods. Sixth Edition. 1967. The Iowa State University Press Ames. Sokal, Robert R. and Rohlf, F. James. Biometry. 1969. W. H. Freeman and Company. San Francisco.

RECEIVED

April 20, 1981.

Bandal et al.; The Pesticide Chemist and Modern Toxicology ACS Symposium Series; American Chemical Society: Washington, DC, 1981.