Interpreting Epidemiological Studies - Advances in Chemistry (ACS

Jul 22, 2009 - Division of Environmental and Occupational Health, School of Public ... of the major potential biases that may occur in epidemiological...
1 downloads 0 Views 1MB Size
4 Interpreting Epidemiological Studies

Environmental Epidemiology Downloaded from pubs.acs.org by YORK UNIV on 12/06/18. For personal use only.

George Maldonado Division of Environmental and Occupational Health, School of Public Health, University of Minnesota, Minneapolis, MN 55455

A primary objective of an epidemiological study is to obtain a valid (unbiased) estimate of the effect of an exposure on disease occurrence. Most epidemiological studies are observational

(nonexperi-

mental) and consequently are subject to more potential biases than experiments. This chapter (1) discusses and gives examples of the major

potential biases that may occur in epidemiological studies,

discusses strategies for controlling

(2)

each bias, and (3) discusses how

to interpret epidemiological results when strategies for

controlling

bias fail to achieve full control.

I V E S T I G A T O R S I N MOST SCIENTIFIC d i s c i p l i n e s e m p l o y an e x p e r i m e n t a l d e sign i n w h i c h subjects are r a n d o m l y allocated to different exposure groups, m a n y of the study conditions are precisely c o n t r o l l e d , a n d relevant factors are precisely measured. E p i d e m i o l o g i s t s rarely have this luxury. M o s t e p i d e m i o l o g i c a l studies use an observational (nonexperimental) design. Instead of r a n d o m l y allocati n g subjects to different levels of an exposure whose effects the investigator wants to study, epidemiologists observe groups of p e o p l e a n d attempt to ascertain accurately the l e v e l of exposure that each p e r s o n has e x p e r i e n c e d . Instead of precisely c o n t r o l l i n g e x p e r i m e n t a l c o n d i t i o n s , epidemiologists o b serve extraneous factors that m i g h t i n f l u e n c e the study outcome, a n d t h e y attempt to c o n t r o l t h e m w i t h observational designs a n d statistical analysis techniques. Instead of precisely m e a s u r i n g relevant factors, epidemiologists frequently " m a k e d o " w i t h i m p r e c i s e measurements a n d the i m p e r f e c t m e m o r y of subjects. Because of the observational nature of most e p i d e m i o l o g i c a l studies, the estimates one computes from e p i d e m i o l o g i c a l data m a y suffer from bias (sys-

0065-2393/94/()241-0029$08.00/0

© 1994 American Chemical Society

30

ENVIRONMENTAL

EPIDEMIOLOGY

tematic d e v i a t i o n of results f r o m truth). T h e r e f o r e , to correctly i n t e r p r e t e p i d e m i o l o g i c a l studies, one m u s t 1. recognize a n d u n d e r s t a n d the i m p o r t a n t p o t e n t i a l sources o f bias 2. evaluate the m a g n i t u d e a n d d i r e c t i o n o f p o t e n t i a l biases I n other w o r d s , i n t e r p r e t i n g the results o f an e p i d e m i o l o g i c a l study is l i k e shaping a r o u g h d i a m o n d into a f i n i s h e d p r o d u c t . T h e r o u g h d i a m o n d is l i k e the p o t e n t i a l l y biased estimates of effect one obtains f r o m an o b s e r vational e p i d e m i o l o g i c a l study. S h a p i n g the d i a m o n d is l i k e m a k i n g i n f e r ences about the m a g n i t u d e a n d d i r e c t i o n of p o t e n t i a l biases. T h e f i n i s h e d stone is l i k e the c o m b i n a t i o n of estimates of effect a n d inferences about biases. M a k i n g inferences about the m a g n i t u d e a n d d i r e c t i o n of possible biases is not a s i m p l e task. It takes m u c h k n o w l e d g e a n d experience to "shape the d i a m o n d " . T h i s chapter outlines a strategy for m a k i n g inferences about p o tential biases a n d highlights the i m p o r t a n t issues to be c o n s i d e r e d . R o t h m a n (I), K e l s e y et a l . (2), C h e c k o w a y et a l . (3), a n d G r e e n l a n d (4) give m o r e i n d e p t h discussions of these issues. T h i s chapter assumes a familiarity w i t h basic e p i d e m i o l o g i c a l t e r m i n o l ogy s u c h as relative risk, odds ratio, a n d risk factor. It also assumes a basic u n d e r s t a n d i n g of cohort a n d case-control e p i d e m i o l o g i c a l study designs. T h e two p r e c e d i n g chapters i n this v o l u m e a n d A h l b o m a n d N o r e l l (5) or W a l k e r (6) give an i n t r o d u c t i o n to these issues.

A Basic Strategy for Interpreting Epidemiological Studies T h e three most i m p o r t a n t types o f bias that can o c c u r i n e p i d e m i o l o g i c a l studies are the f o l l o w i n g : 1. information bias 2. c o n f o u n d i n g 3. selection bias A basic strategy for i n t e r p r e t i n g e p i d e m i o l o g i c a l results is to evaluate the m a g n i t u d e a n d d i r e c t i o n of each of these three biases. T h i s chapter (1) defines a n d gives examples of these three biases, (2) discusses strategies for c o n t r o l l i n g each bias, a n d (3) discusses h o w to i n t e r p r e t e p i d e m i o l o g i c a l r e sults w h e n bias m a y be present.

Information Bias Bias d u e to errors i n m e a s u r i n g (or classifying) the study variables is c a l l e d information bias.

4.

MALDONADO

Interpreting Epidemiological

Studies

31

Information bias is perhaps the most i m p o r t a n t bias i n e p i d e m i o l o g i c a l studies, for several reasons: 1. T h e r e are sources o f m e a s u r e m e n t e r r o r i n n e a r l y a l l e p i d e miological studies. F o r example, m e a s u r e m e n t e r r o r m a y b e caused b y i m p e r f e c t r e c a l l of subjects, b y i m p r o p e r c a l i b r a t i o n of m e a s u r e m e n t e q u i p m e n t , a n d b y use of a p r o x y v a r i able as a substitute for the actual variable o f interest. 2. N e a r l y a l l sources o f m e a s u r e m e n t e r r o r cause i n f o r m a t i o n bias. 3. T h e m a g n i t u d e o f i n f o r m a t i o n bias can b e large. 4. Information bias can usually be h a n d l e d o n l y b y m i n i m i z i n g m e a s u r e m e n t e r r o r or b y m a k i n g inferences about the m a g n i t u d e a n d d i r e c t i o n o f bias. U n l i k e some o f the other biases, it usually cannot be e l i m i n a t e d b y data analysis t e c h n i q u e s .

Hypothetical Example.

C o n s i d e r a case-control study of an o c c u -

pational c h e m i c a l exposure a n d l u n g cancer. A s s u m e that i f all subjects w e r e correctly classified i n t o " e x p o s u r e " a n d "disease" categories, an investigator w o u l d observe the f o l l o w i n g data: Perfectly Classified Data (Truth) Exposure Category

Case

Control

Exposed Unexposed

200 50

200 200

T h e correct (true) odds ratio (OR) is _ 200 x 200 OR = — — = 50 X 200

4.0.

In this h y p o t h e t i c a l example, exposure to the c h e m i c a l m u l t i p l i e s the risk of disease b y a factor of 4. A s s u m e also the f o l l o w i n g : 1. D a t a o n actual c h e m i c a l exposure are not available, so s u b jects are classified into exposure categories o n the basis of t h e i r j o b titles. 2. A s a result o f this i m p e r f e c t m e t h o d o f d e t e r m i n i n g exposure, 5 0 % of exposed subjects are misclassified into the " u n e x p o s e d " g r o u p , a n d a l l u n e x p o s e d are c o r r e c t l y classified.

ENVIRONMENTAL EPIDEMIOLOGY

32

Because of the foregoing assumptions, the investigator w o u l d observe the f o l l o w i n g data: Misclassified Study Data Exposure Category

Case

Control

Exposed Unexposed

100 150

100 300

U s i n g the misclassified study data, the estimated O R is OR =

100 X 300 150 X

100

=

2.0

w h i c h is not e q u a l to the true O R . Information bias makes it appear that exposure to the c h e m i c a l m u l t i p l i e s the risk of disease b y a factor of 2 ( i n stead of the true factor of 4).

Evaluating the Magnitude and Direction of Information Bias. To evaluate the m a g n i t u d e a n d d i r e c t i o n of i n f o r m a t i o n bias, the sources of m e a s u r e m e n t (or classification) e r r o r m u s t b e i d e n t i f i e d . T h e m a g n i t u d e of the bias depends o n the m a g n i t u d e of the e r r o r i n m e a s u r i n g (or classifying) variables. T h e greater the rate of error, the greater the bias. T h e d i r e c t i o n of the bias depends o n h o w the m e a s u r e m e n t (or classification) e r r o r is d i s t r i b u t e d i n the data. (A discussion of these issues was p r e s e n t e d b y C o p e l a n d et a l . (7), G r e e n l a n d (8), G r e e n l a n d a n d R o b i n s (9), P o o l e (10), D o s e m e c i et al. (II), a n d i n the letters i n the A u g u s t 15, 1991, issue of the American Journal of Epidemiology.)

Confounding T h e most m e a n i n g f u l effect estimates one obtains f r o m e p i d e m i o l o g i c a l data have two characteristics: (1) they d i r e c t l y compare the occurrence of disease i n two groups that have different levels of a n exposure; (2) t h e y isolate the effect of the exposure from other factors that m i g h t also i n f l u e n c e the disease of interest—factors to be c o n t r o l l e d , not s t u d i e d . F o r e x a m p l e , an i n v e s t i gator m i g h t want to estimate h o w the rate of disease increases as some occupational exposure increases, adjusted for the effects of o t h e r exposures or personal characteristics that also affect the disease of interest. T h e s e c o m parison estimates usually take the f o r m of ratios or differences of disease rates o r risks (e.g., " r e l a t i v e r i s k " a n d " a t t r i b u t a b l e r i s k " measures). W h e n one computes these c o m p a r i s o n estimates, a f u n d a m e n t a l ass u m p t i o n is made: O n e assumes that disease occurrence a m o n g the u n e x posed accurately predicts what w o u l d have h a p p e n e d i n the exposed group

4.

MALDONADO

Interpreting Epidemiological

33

Studies

i f they h a d not b e e n exposed. T h e c e n t r a l i d e a is this: O n e assumes that i n the absence o f exposure, disease o c c u r r e n c e w o u l d b e t h e same i n b o t h groups (12, 13). If this assumption is i n c o r r e c t , the o b s e r v e d c o m p a r i s o n b e t w e e n exposure groups is c o n f o u n d e d (i.e., c o n f o u n d i n g exists). T h a t i s , t h e estimate of effect reflects not o n l y t h e effect o f exposure b u t also t h e effects o f o t h e r factors that i n f l u e n c e disease o c c u r r e n c e (i.e., factors to b e c o n t r o l l e d , not studied). T h e effect estimate w i l l b e a b i a s e d measure o f the i m p a c t o f the exposure alone o n disease occurrence. H o w can one tell i f c o n f o u n d i n g is p r e s e n t ? A n s w e r this q u e s t i o n : W o u l d disease o c c u r r e n c e b e t h e same i n t h e t w o groups i f the exposure was a b sent? I f the answer is n o , t h e n c o n f o u n d i n g is present. T o find t h e answer, w e search for differences b e t w e e n exposure groups i n t h e d i s t r i b u t i o n s o f extraneous risk factors for t h e disease (i.e., risk factors to b e c o n t r o l l e d , not studied). W e search for confounders.

Hypothetical Example. C o n s i d e r a fixed-cohort study o f the r e l a t i o n s h i p o f n i t r o g l y c e r i n (an occupational c h e m i c a l exposure) to d e a t h f r o m m y o c a r d i a l infarction ( M I ) . L e t us assume t h e f o l l o w i n g : 1. R e g u l a r exercise protects against M I d e a t h . 2. O f the exposed subjects, 9 5 % exercise regularly, whereas o n l y 5 0 % o f the u n e x p o s e d subjects exercise regularly. I n this h y p o t h e t i c a l e x a m p l e , exercise confounds a s i m p l e c o m p a r i s o n of n i t r o g l y c e r i n - e x p o s e d versus - u n e x p o s e d groups. I f n i t r o g l y c e r i n exposure h a d n o effect o n M I d e a t h , w e s h o u l d not expect t h e o c c u r r e n c e o f M I deaths to b e the same i n the exposed a n d u n e x p o s e d groups. I n fact, w e s h o u l d expect that M I deaths w i l l o c c u r less f r e q u e n t l y a m o n g exposed s u b jects [because o f assumptions (1) a n d (2)]. T h e f o l l o w i n g is a h y p o t h e t i c a l data set w i t h exercisers a n d nonexercisers c o m b i n e d : Exercisers and Nonexercisers Combined Exposure Category

Deaths

Nondeaths

Total

Exposed Unexposed

1200 1250

208,800 198,750

210,000 200,000

T h e b i a s e d (confounded) relative risk (RR) is RR

=

1200/210,000 1250/200,000

= 0.9

ENVIRONMENTAL EPIDEMIOLOGY

34

It appears that n i t r o g l y c e r i n exposure m u l t i p l i e s t h e risk o f M I death b y a factor o f 0.9. (i.e., n i t r o g l y c e r i n exposure is slightly protective o f M I death). T h i s estimate is biased because o f the c o n f o u n d i n g effect o f exercise. If w e examine exercisers a n d nonexercisers separately (i.e., i f w e c o n t r o l for c o n f o u n d i n g b y exercise), w e c a n c o m p u t e u n c o n f o u n d e d estimates o f effect: Exercisers Exposure Category

Deaths

Nondeaths

Total

Exposed Unexposed

1000 250

199,000 99,750

200,000 100,000

1000/200,000 250/100,000

Nonexercisers Exposure Category

Deaths

Nondeaths

Total

Exposed Unexposed

200 1000

9800 99,000

10,000 100,000

200/10,000 1000/100,000 T h e correct R R is 2.0 for b o t h exercisers a n d nonexercisers. E x p o s u r e to n i t r o g l y c e r i n doubles t h e risk o f M I death. T h e exposure is not protective of M I death as t h e c o n f o u n d e d estimate w o u l d have l e d us to b e l i e v e . Control of Confounding.

F o r c o n f o u n d i n g factors that have b e e n

m e a s u r e d , a n investigator can use study design a n d data analysis techniques to e l i m i n a t e confounding. S t u d y d e s i g n techniques i n c l u d e t h e f o l l o w i n g : 1. r a n d o m l y allocating subjects to exposure groups (experiment) (This strategy also helps c o n t r o l u n m e a s u r e d confounders.) 2. restricting the e l i g i b i l i t y o f subjects a c c o r d i n g to values o f the potential confounders 3. m a t c h i n g i n a cohort study (i.e., selecting subjects so that exposed a n d u n e x p o s e d groups have s i m i l a r d i s t r i b u t i o n s o f p o tential c o n f o u n d i n g factors)

4.

MALDONADO

Interpreting Epidemiological

35

Studies

D a t a analysis techniques i n c l u d e the f o l l o w i n g : 1. stratified analysis methods 2. m u l t i v a r i a t e m o d e l i n g techniques

Judging the Magnitude and Direction of Confounding.

To eval-

uate c o n f o u n d i n g due to u n m e a s u r e d confounders, an investigator m u s t first identify p o t e n t i a l c o n f o u n d i n g factors that w e r e not m e a s u r e d . A discussion of this issue was g i v e n b y M i e t t i n e n a n d C o o k (12), G r e e n l a n d a n d R o b i n s (13), R o t h m a n (I), and Kass a n d G r e e n l a n d (14). T h e m a g n i t u d e of the bias depends o n h o w strongly the confounder affects disease occurrence a n d o n h o w strongly the confounder a n d the exposure are associated. T h e stronger the confounder-disease relationship a n d the stronger the exposure-eonfounder association, the greater the bias w i l l be. T h e d i r e c t i o n of the bias depends o n the directions of these associations. A m o r e d e t a i l e d discussion was g i v e n b y F l a n d e r s a n d K h o u r y (15).

Selection Bias I n a case-control study, bias due to the w a y i n w h i c h cases or controls are selected for study is c a l l e d selection bias. A case-control study can y i e l d an u n b i a s e d estimate of the effect of exposure o n disease occurrence i f the f o l l o w i n g two conditions are m e t (1, 4): 1. T h e sample of cases gives an u n b i a s e d estimate of the exposure d i s t r i b u t i o n a m o n g cases i n the source p o p u l a t i o n over the study p e r i o d . 2. T h e sample of controls gives an u n b i a s e d estimate of the exposure d i s t r i b u t i o n i n the p o p u l a t i o n at risk over the study period. If these two conditions are not m e t because of the w a y cases or controls are selected for study, selection bias may exist, a n d estimates of effect m a y b e incorrect. Unfortunately, biased samples of cases a n d controls can occur for m a n y reasons, w h i c h is w h y selection bias s h o u l d b e c o n s i d e r e d w h e n i n t e r p r e t i n g e p i d e m i o l o g i c a l studies. F o r example, i f exposed cases refuse to participate m o r e often than unexposed cases, the s t u d i e d cases w i l l not p r o v i d e an u n biased estimate of the exposure d i s t r i b u t i o n a m o n g cases i n the p o p u l a t i o n d u r i n g the study p e r i o d . F o r another example, i f a l l exposed cases are d e tected, b u t some unexposed cases are u n d e t e c t e d , the cases s t u d i e d w i l l not p r o v i d e an u n b i a s e d estimate of the exposure d i s t r i b u t i o n a m o n g cases i n the p o p u l a t i o n d u r i n g the study p e r i o d . F o r a final e x a m p l e , i f exposed c o n trols refuse to participate m o r e often than u n e x p o s e d controls, the controls

36

ENVIRONMENTAL

EPIDEMIOLOGY

s t u d i e d w i l l not p r o v i d e a n u n b i a s e d estimate o f the exposure d i s t r i b u t i o n a m o n g the p o p u l a t i o n at risk d u r i n g the study p e r i o d .

Hypothetical Example. C o n s i d e r the f o l l o w i n g h y p o t h e t i c a l p o p u lation data for t h e relationship b e t w e e n a c h e m i c a l exposure a n d l u n g c a n cer: Source Population (Truth) Exposure Category

Case

Noncase

Exposed Unexposed

400 600

9600 14,400

I n this p o p u l a t i o n , O R is _ 4 0 0 X 14,400 , „ OR = = 1.0 600 X 9600 E x p o s u r e to t h e c h e m i c a l is n o t a risk factor for l u n g cancer i n t h e source population. C o n s i d e r a case-control study o f this p o p u l a t i o n . A s s u m e that, u n k n o w n to the investigator, subjects w e r e s a m p l e d into t h e study w i t h the f o l l o w i n g selection probabilities: Probabilities of Being Selected into Study Exposure Category

Case

Noncase

Exposed Unexposed

0.4875 0.2583

0.0146 0.0146

H e r e are t h e data o n e w o u l d expect w i t h t h e foregoing selection p r o b abilities: Study Data Exposure Category

Case

Control

Exposed Unexposed

195 155

140 210

T h e exposed a n d u n e x p o s e d controls have e q u a l selection p r o b a b i l i t i e s , a n d so controls w i l l give a n u n b i a s e d estimate o f the exposure d i s t r i b u t i o n

4.

MALDONADO

Interpreting Epidemiological

Studies

37

a m o n g the p o p u l a t i o n at risk. I n the p o p u l a t i o n , 9600/24,000 = 0.40 of the noncases w e r e exposed to the c h e m i c a l . F r o m the controls i n the case-cont r o l data, w e w o u l d correctly estimate that 140/350 = 0.40 of the p o p u l a t i o n at risk was exposed to the c h e m i c a l . T h e exposed a n d unexposed cases have unequal selection p r o b a b i l i t i e s , perhaps because unexposed cases refused to participate i n the s t u d y m o r e often than exposed cases. Because of the u n e q u a l selection p r o b a b i l i t i e s , the cases i n the study will not give an u n b i a s e d estimate of the exposure d i s t r i b u t i o n a m o n g cases i n the p o p u l a t i o n . I n the p o p u l a t i o n , 400/1000 = 0.40 of the cases w e r e exposed to the c h e m i c a l . F r o m the cases i n the case-control study, w e w o u l d i n c o r r e c t l y estimate that 195/300 = 0.56 of the cases i n the p o p u l a t i o n w e r e exposed to the c h e m i c a l . T h e biased O R , 195 X

210

is not e q u a l to the true O R (1.0) because of selection bias. S e l e c t i o n bias w o u l d lead us to c o n c l u d e i n c o r r e c t l y that c h e m i c a l exposure increases the risk of l u n g cancer.

Controlling Selection Bias.

I f w e can identify (and measure) a factor

that affects the chance of selection into the study, w e can adjust for this factor i n the analysis a n d can r e m o v e selection bias.

Evaluating the Magnitude and Direction of Selection Bias. M a k i n g inferences about the m a g n i t u d e a n d d i r e c t i o n of selection bias is difficult. F i r s t an investigator m u s t identify reasons w h y the sample of cases or the sample of controls is b i a s e d — i n practice, this is exceedingly difficult to do. O n e can t h e n p e r f o r m an evaluation u s i n g sensitivity analysis (i.e., a " W h a t i f ? " analysis based o n h o w one t h i n k s selection m i g h t b e biased) [e.g., M a c l u r e a n d H a n k i n s o n (16)].

Summary I n t e r p r e t i n g the results of an e p i d e m i o l o g i c a l study is l i k e s h a p i n g a r o u g h d i a m o n d into a f i n i s h e d g e m . T h e r o u g h d i a m o n d is l i k e the p o t e n t i a l l y biased estimates of effect one obtains f r o m an observational e p i d e m i o l o g i c a l study. Because of the observational nature of most e p i d e m i o l o g i c a l studies, results are almost always biased to some degree. S h a p i n g the d i a m o n d is l i k e m a k i n g inferences about the m a g n i t u d e and d i r e c t i o n of p o t e n t i a l biases. T h e most i m p o r t a n t biases i n e p i d e m i o l o g i c a l studies are i n f o r m a t i o n bias, c o n f o u n d i n g , a n d selection bias. T h e f i n i s h e d stone is l i k e the c o m b i n a t i o n of estimates of effect a n d inferences about biases.

38

ENVIRONMENTAL

EPIDEMIOLOGY

Acknowledgments T h e author thanks Sander G r e e n l a n d , Jack M a n d e l , a n d t h e students o f P u b H 8194, S p r i n g Q u a r t e r , 1992, U n i v e r s i t y o f M i n n e s o t a S c h o o l o f P u b l i c H e a l t h , for t h e i r c o m m e n t s .

References 1. Rothman, K. J. Modern Epidemiology; Little, Brown: Boston, M A , 1986. 2. Kelsey, J . L.; Thompson, W.D.; Evans, A . S. Methods in Observational Epidemiology; Oxford University: New York, 1986. 3. Checkoway, H.; Pearce, N . E.; Crawford-Brown, D . J. Research Methods in Occupational Epidemiology; Oxford University: New York, 1989. 4. Greenland, S. In Oxford Textbook of Public Health, 2nd ed.; Holland, W . W.; Detels, R., Eds.; Oxford University: New York, 1991; Vol. 2, Methods of Public Health. 5. Ahlbom, A.; Norell, S. Introduction to Modern Epidemiology; Epidemiology Resources: Chestnut H i l l , PA, 1990. 6. Walker, A . M . Observation and Inference; Epidemiology Resources: Chestnut H i l l , PA, 1991. 7. Copeland, K. T.; Checkoway, H.; McMichael, A . J.; Holbrook, R. H . A m . J. Epidemiol. 1977, 105, 488-495. 8. Greenland, S. Am. J. Epidemiol. 1980, 112, 564-569. 9. Greenland, S.; Robins, J. M. Am. J. Epidemiol. 1985, 122, 495-506. 10. Poole, C . Am. J. Epidemiol. 1985, 122, 508. 11. Dosemeci, M.; Wacholder, S.; L u b i n , J . H. Am. J. Epidemiol. 1990, 132, 746748. 12. Miettinen, O . S.; Cook, F. Am. J. Epidemiol. 1981, 114, 593-603. 13. Greenland, S.; Robins, J. M. Int. J. Epidemiol. 1986, 15, 413-419. 14. Kass, P. H.; Greenland, S. J. Am. Vet. Med. Assoc. 1991, 199, 1569-1573. 15. Flanders, W . D.; Khoury, M . J. Epidemiology 1990, 1, 239-246. 16. Maclure, M.; Hankinson, S. Epidemiology 1990, 1, 441-447.

RECEIVED December 22, 1992.

for r e v i e w S e p t e m b e r 3, 1992. ACCEPTED r e v i s e d m a n u s c r i p t