So, You Want to Look for Biomarkers (Introduction ... - ACS Publications

May 2, 2005 - So, You Want to Look for Biomarkers. (Introduction to the Special Biomarkers Issue)†. Joshua LaBaer*. Harvard Institute of Proteomics,...
0 downloads 0 Views 232KB Size
So, You Want to Look for Biomarkers (Introduction to the Special Biomarkers Issue)† Joshua LaBaer* Harvard Institute of Proteomics, Harvard Medical School, 320 Charles Street, Cambridge, Massachusetts 02141 Received May 2, 2005

The burgeoning field of proteomics plays a powerful and relevant role in the discovery of biomarkers, which are biometric measurements that convey information about the biological condition of the subject being tested. Biomarkers have changed the manner in which we diagnose disease, monitor the effect of therapy, classify disease, detect toxicity, and develop new drugs. The central part that proteins command in both disease etiology and treatment make them prime biomarker candidates. Indeed, the majority of clinical tests in use today measure proteins. This perspective introduces the Journal of Proteome Research Special Issue on Proteomics and Biomarkers. It outlines the major applications of biomarkers, discusses the basics of statistically assessing them and considers the crucial choice of sample type. Central considerations of biomarker discovery and validation, particularly with respect to their intended clinical and research applications, are highlighted. Keywords: biomarkers • clinical proteomics • sensitivity • specificity • statistical considerations • toxicity • drugs

Few topics garner as much attention and excitement in proteomics as its application toward the discovery of biomarkers. Several large programs have been established or are under consideration at the National Institutes of Health (NIH) to develop and employ proteomics technologies for biomarker discovery and validation in areas ranging from infectious disease to cancer (see articles by Hartwell et al., Sheeley et al., and Srivistava and Srivistava in this issue). More than a dozen public or private companies now focus on developing technologies or services to find biomarkers.1 Pharmaceutical companies are considering when and how to employ proteomics biomarkers in the drug validation pipeline (see articles by Voshol et al. and Vitzthum et al. in this issue) and the U.S. Food and Drug Administration must consider what value to place on these discoveries in the context of approving clinical tests and drugs (see article by Hackett and Gutman in this issue). This excitement has spilled over into the conference planning industry, which will hold a dozen or more meetings on various aspects of the topic this year. At their heart, biomarkers are tests. They are biometric measurements that convey information about the biological condition of the subject being tested. These measurements might be a quantitative readout of a specific analyte, sophisticated imaging studies, or in the genomics and proteomics paradigms, measurement of multiple analytes combined into mathematical models.2 Biomarkers obtain their value from their ability to differentiate between two or more biological states. This ability to segregate distinguishes biomarkers from simple measurements. Determining the concentration of a particular protein in the cerebral spinal fluid (CSF) of Alzheimer’s patients †

Part of the Biomarkers special issue. * To whom correspondence should be addressed. E-mail: hms.harvard.edu. 10.1021/pr0501259 CCC: $30.25

 2005 American Chemical Society

josh@

is a clinical measurement. Once this measurement is backed by the appropriate clinical studies demonstrating that concentrations above a defined threshold predict the presence of the disease, it becomes a biomarker. The distinction is not trivial. Proteomic identifications are too frequently referred to as biomarkers before they have gone through the crucial validation steps needed to demonstrate that they are reliable predictors. Moreover, the process of converting a reliable predictor into a practical clinical test is even more daunting (see the article by Vitzthum et al. in this issue). Applications of Biomarkers. Biomarkers find many applications in modern biology and medicine. From pregnancy tests to monitoring cholesterol levels, they already impact our daily lives, and the ongoing hunt for diagnostics and predictors of responses promises to affect our future in profound ways. The characteristics and qualities demanded of a biomarker depend on how it will be employed. Monitoring Disease. Many biomarkers are used to monitor chronic diseases such as cancer (e.g., carcinoembryonic antigen (CEA),3 CA-125,3 etc.), diabetes (Hemoglobin-A1c 4), and autoimmune disease (Rheumatoid Factor).4 Clinicians may test these markers each time a patient visits to follow the disease status. This usage requires that some quantitative measurement of the biomarker (e.g., blood concentration or tumor volume on computed tomography (CT) scan) correlate with disease severity. Often, it is the trend from visit to visit, sometimes referred to as the biomarker velocity, not simply the absolute measurement, which tells the relevant story. For example, an elevated but unchanged tumor marker might indicate stable disease in a cancer patient, but if it starts to rise over several consecutive visits, this might suggest renewed tumor growth. An advantage of following serial measurements within a single patient is that significant variation in the absolute measurement Journal of Proteome Research 2005, 4, 1053-1059

1053

Published on Web 06/30/2005

perspectives of the biomarker can still be tolerated between patients. Moreover, even if the biomarker does not track with disease in all individuals, as long as its relationship to disease progression is statistically significant and there is a process to determine whether the biomarker is predictive for a given individual, it can still be used. For example, experience has shown that CEA correlates with tumor burden in many colon cancer patients, but this is not always true and the magnitude of the levels can vary significantly from patient to patient.5-7 However, a clinician can determine that CEA is useful for monitoring colon cancer in an individual patient by establishing that for him a rise in CEA accompanies tumor growth by CT scan. Endpoint Analysis (Predicting Response). In cases where a biomarker’s level tracks with disease severity, it can also be used to monitor the efficacy of therapeutic intervention. This is especially useful if the test is either less expensive or reflects changes earlier than standard tests. This not only aids patients and their clinicians in treatment decisions, but also offers the promise of simplifying the execution of pre-clinical and clinical trials. One of the most expensive aspects of drug development is the time and cost of monitoring patients on a new therapy while waiting to determine if it is effective and safe. Thus, there is significant interest in the pharmaceutical industry to find biomarkers that reflect drug response or toxicity earlier than waiting for clinical events or using current tests. Furthermore, surrogate markers (such as inhibition of phosphorylation of targets) may aid in determining optimal dosing of a drug. The quality demands placed on the biomarker in this setting depend on whether it is employed to justify the success of a new therapy or to recommend abandoning it. Obviously, a biomarker must be very predictive if it is used as primary evidence to establish the efficacy or safety of a treatment because the consequence of error could lead to the use of an ineffective or even harmful therapy. In contrast, a lower predictive value might suffice if the biomarker is used early in development as a preliminary screen to abandon experimental therapies from further study. Here, the consequence of a false readout is the missed opportunity to develop what might have been a useful treatment. This latter strategy of using biomarkers to eliminate leads in early development is being increasingly adopted because the earlier an unsuccessful drug can be eliminated, the lower the overall cost of drug development. Disease Diagnosis. Perhaps the most sought after application of biomarkers is to diagnose disease. This is particularly true for diseases where early intervention improves the success of treatment and where it is felt that current tests do not detect disease early enough. It follows that early detection biomarkers have little clinical value if there is no established advantage to early intervention, though they may find application in basic research. The use of biomarkers to diagnose disease can be generally divided into those that are used for early detection of chronic diseases (e.g., diabetes and cancer) and those that are used in the acute clinical setting (e.g., myocardial infarction, stroke, and infection). Toward this end, there are a number of efforts to develop diagnostic biomarkers underway in the proteomics community, many of which are discussed in this issue, including cardiovascular disease (see Vivanco et al.), cancer (see Srivastava et al., Anderson and LaBaer, and Alaiya et al.) and lung diseases (see Marko Varga et al.). In general, the demands placed on biomarkers used for diagnosis are much higher than those used to monitor disease 1054

Journal of Proteome Research • Vol. 4, No. 4, 2005

LaBaer

in existing patients. In this setting, quantitative values must be established that set the boundary between a positive and negative test. This requires that the measurements for individuals without the disease exhibit relatively little variation otherwise establishing a cutoff value will be difficult. One useful strategy is to link the biomarker test with other tests. This might involve doing tests in parallel or in series (see article by Vitzthum et al. in this issue). For example, the measurement of the prostate specific antigen (PSA) in the blood might be done in parallel with physical examination by a physician (primary tests) and an ultrasound guided biopsy (secondary test) might follow only if the results of the primary tests are suggestive.8 By linking tests together, the overall confidence in the result improves. Subtype Classification. Another use of biomarkers enjoying growing attention is to segment a disease into multiple classes. This is especially true for genomic and proteomic studies where measurements are made for thousands of genes or proteins per sample, and clustering algorithms are applied to group samples together in a manner that might reflect biological variants of a disease. There are two general strategies here, and the investigator must consider which applies. The first is to find biomarkers that discriminate between known subclasses. Here, data must be available from individuals who have been separated by other methods into the relevant subclasses prior to the study and a search can then be employed to find markers that distinguish among them. The second is to define novel subclasses in a group previously treated as homogeneous. Here, an unsupervised clustering of samples by their global expression pattern is performed followed by an examination to determine if there is a correlation between the newly defined subclasses and their phenotypes. This information is likely to lead to a growing understanding of the biological complexity that underlies many diseases. It is worth remembering, however, that from a clinical standpoint, the subclassification of diseases is only useful if it informs clinical decision making. It is not particularly helpful to subclassify a patient’s breast cancer, for example, if the classification scheme will not affect her choice or timing of therapies. The broad variety of applications for biomarkers, and the different qualities demanded of each, suggest the first rule of biomarkers:

Biomarker Rule #1. Define the Goals Clearly. Basic Statistical Considerations for Biomarkers. As the most common applications of biomarkers focus on their use as clinical tests, it is worth revisiting some of the basic statistical considerations that apply to such tests.9 In a simplified representation, clinical tests have a positive or negative outcome, which is intended to reflect the true presence or absence of some clinical state. Even tests that result in a quantifiable readout, such as the concentration of an analyte in blood, are frequently reduced to binary tests by setting a threshold value above which the test is considered positive. Figure 1 is a representation of all of the possible outcomes for a simple binary clinical test for a population. This figure illustrates using a test as a diagnostic to predict the presence of disease, but could equally apply to other clinical states, such as response to a therapy, developing toxicity or carrying a genetic trait. The squares marked a and d reflect those patients for whom the test reflected the truth; a indicates the patients who have the disease and for whom the test was positive and d indicates individuals who are disease free and tested negative, ap-

perspectives

So, You Want to Look for Biomarkers...

Figure 1. Figure 1. Representation of the possible outcomes of a clinical test. Clinical tests are used to reveal the presence or absence of a clinical state. The entire tested population is represented by the sum of all quadrants. This example reflects a test intended to detect the presence of disease, but could also be applied to testing any clinical state, (a) the population with the disease for whom the test is correctly positive; (b) false positives, the population without the disease for whom the test was incorrectly positive; (c) false negatives, the population with the disease, for whom the test was incorrectly negative; and (d) the population without the disease for whom the test was correctly negative.

propriately. In the real world, no clinical test is perfect and errors occur. False positives result when the test reads positive in the absence of true disease (represented by b in Figure 1). False negatives result when the test fails to read positive despite the true presence of disease (represented by c in Figure 1). From a clinical or practical perspective, there are consequences of incorrect tests. False positives, for example, might result in costly and unnecessary further testing in a patient incorrectly informed that the disease is present. In such cases, patients may experience emotional angst or end up with unnecessary procedures. If the test were used as an end point for a clinical trial, then a false positive result might suggest falsely that the treatment is failing. The consequences of false negatives are generally even more concerning, potentially leading to the missed diagnosis of real disease, the missed detection of toxicity, and missed opportunity for therapeutic intervention. In thinking about the applications for a biomarker:

Biomarker Rule #2. Understand the Consequences of Being Wrong. Sensitivity and Specificity. Two critical characteristics of all clinical tests are their sensitivity and specificity. Sensitivity is a measure of the ability of the test to identify a condition when it is present. In Figure 2, this is represented by a/(a + c), where a + c indicates all of the individuals in the population who actually have the disease and a is the number of those who test positive. Specificity is the ability of the test to rule out a condition when it is absent. It is represented by d/(b + d), where b + d represents all of the individuals who are disease free and d is the number of those who test negative. Although not always true, improving the sensitivity of a test may lead to decreasing its specificity, particularly if it involves choosing the threshold value for calling the test positive. Consider the data in the Table 1, which describes using postprandial blood glucose concentration as a diagnostic test for diabetes.10 (Note that this test is not how diabetes is currently diagnosed.) If the investigator were to choose values of 80 mg/ dl or above to indicate diabetes, then the test would successfully detect diabetes in 97% of true patients. However, almost 3/4 of those that did not actually have diabetes would also test positive. Conversely, if the investigator increased the stringency by choosing 160 mg/dL as the threshold needed for a positive

Figure 2. Figure 2. Representation of sensitivity and specificity. Sensitivity is the fractional measurement of the ability of the test to detect the disease when it is present. Specificity is the fractional measurement of the ability of the test to read negative in the absence of disease. Table 1. Sensitivity and Specificity of Post Prandial Blood Sugar as a Test for Diabetes Mellitus post prandial blood sugar

sensitivity %

specificity %

70 80 90 100 110 120 130 140 150 160 170 180 190 200

98.6 97.1 94.3 88.6 85.7 71.4 64.3 57.1 50.0 47.1 42.9 38.6 34.3 27.1

8.8 25.5 47.6 69.8 84.1 92.5 96.9 99.4 99.6 99.8 100.0 100.0 100.0 100.0

test, then nearly everyone who tested negative would indeed be disease-free. But the test would fail to detect half of those that had diabetes. Predictive Value of Positive and Negative Tests. Sensitivity and specificity are inherent properties of the biomarker test. If they are well established, they will hold true regardless of the population tested. However, what matters most in clinical decision making are not sensitivity and specificity, but rather two other characteristics called the predictive value of a positive test (PVP) and the predictive value of a negative test (PVN). Unlike sensitivity and specificity, the PVP and PVN depend heavily on the population tested. If an individual tests positive, the predictive value of a positive test (PVP) indicates the likelihood that he or she actually has the disease (also referred to as PPV). In Figure 3, this is reflected by a/(a + b), where a + b reflects all the individuals who tested positive and a indicates the number of those who truly have the disease. If an individual tests negative, the predictive value of a negative test (PVN) states the likelihood that he or she is actually disease free (also referred to as NPV). In Figure 3, this is indicated by d/(c + d), where c + d reflects all the individuals who tested negative and d indicates those who are truly disease free. Importance of Population Choice when Using Tests. The PVP and the PVN will vary depending on the prevalence of the disease in the population that is being tested. This is best illustrated by considering an example such as predicting the Journal of Proteome Research • Vol. 4, No. 4, 2005 1055

perspectives

LaBaer

Figure 3. Representation of predictive value. The predictive value of a positive test (PVP) measures the likelihood that a positive test has correctly predicted the outcome. The predictive value of a negative test (PVN) measures the likelihood that a negative test has correctly ruled out the outcome. Unlike sensitivity and specificity, which are inherent to the test itself, PVP and PVN depend on the likelihood (prevalence) of the outcome in the population being tested. Table 2. Prevalence of Prostate Cancer in Different Populations of Men patient group

cases/100 000

all men men g 75 y.o. clinically suspicious nodule detected

35 500 50 000

presence of prostate cancer using prostate specific antigen (PSA), a widely used test to monitor and detect prostate cancer8. Prevalence refers to how common the disease is in the population; represented in Figure 3 by a + c/(a + b + c + d). The prevalence of prostate cancer in three different populations of men is indicated in Table 211. As indicated in this table, the likelihood of having prostate cancer among all men is very low (∼0.035%), whereas for the population of men in whom a suspicious nodule has been palpated, the risk of disease rises to 50%. For the purposes of this example, let us assume that the threshold values chosen for the PSA test provide a sensitivity of 70% and specificity of 90%. Using these values, the distribution of individuals expected for a population of 100 000 nodule-positive men is shown in Figure 4a. In this population, the prevalence of disease is very high; half of these men have the disease. Not surprisingly, the PVP for this population is also very high, ∼88%. That is, from a statistical perspective, an individual with both a palpable prostate nodule and a positive PSA test would have an 88% chance of having prostate cancer. If the same test were applied to a general population of 100 000 men (no age restriction, no data on clinical nodules), the distribution is indicated in Figure 4b. Here, a very different PVP is observed, and a positive test for these individuals is far less meaningful at only 0.2%. This is because the prevalence of prostate cancer among all men is very low. Explaining the meaning of test results requires not only knowledge of the characteristics of the test, but also of the population to which the tested individual belongs. From this, it is clear why some tests are only valuable when applied to a specific population. Other Population Considerations. It is common during the biomarker discovery process to compare patients with advanced disease to healthy individuals. This is based on the reasoning that biomarkers may be more abundant or easier to observe in patients with established disease. However, it is 1056

Journal of Proteome Research • Vol. 4, No. 4, 2005

Figure 4. Effect of disease prevalence on the predictive value of tests. (a) In this example, half of a hypothetical population of 100 000 men who have a palpable prostate nodule will have prostate cancer (50 000). The distribution of PSA test results in this population is shown, assuming 70% sensitivity and 90% specificity. Computing the PVP reveals that it is 88% predictive for this population. (b) Prostate cancer is much less common when considering all men, whose distribution is shown here for the same test. The PVP here is only 0.2%. The predictive value of the test depends strongly on the prior likelihood of the condition in the population being tested.

worth considering that patients with early disease or even individuals with “pre-disease” may produce different biomarkers from well-established patients.12 Thus, if the goal for developing a biomarker is to detect early disease, it may be important to use the appropriate population during the discovery process. Selecting the appropriate control populations also requires some care. In many clinical settings, clinicians not only differentiate disease from health, but more frequently must distinguish among multiple possible diseases. For example, a patient presenting with gastrointestinal complaints might have bowel cancer, but much more likely could have any number of benign intestinal conditions. If the investigator’s goal is to find a biomarker that detects colon cancer, she must ensure that it does not spuriously detect all bowel conditions. In this context, particular attention must be paid to inflammation, which is a common reactive process that occurs in a broad range of illnesses.

Biomarker Rule #3. Consider the Target and Control Populations Carefully Avoiding Statistical Pitfalls. Comparing the Means. It may be tempting to use criteria other than sensitivity and specificity to evaluate a potential biomarker test, but this can easily

So, You Want to Look for Biomarkers...

Figure 5. Statistical significance and biomarkers. The average values for the disease and normal populations are statistically different. However, the range of values is so broad that it would be difficult to determine if values found in the uncertain zone belong to patients or healthy individuals.

mislead the investigator. Comparing the mean value for a biomarker to determine statistical significance (e.g., by t test or ANOVA) is a commonly sought after and mistaken substitute. In fact, displaying even highly confident “statistical differences” between the mean values for patients and normal controls does not demonstrate that a biometric will make a good biomarker. Biomarkers not only require statistical differences, but also the distribution of anticipated values for each of the tested conditions must be separate enough that sample assignment is not ambiguous. Figure 5 illustrates a hypothetical test for which the mean values for the disease and normal populations are statistically different, but it will be difficult to establish a cutoff value that cleanly separates the two populations. The range of values among the individuals in both groups overlaps too much. A better method to determine if a biometric will make a good biomarker is to select a series of cutoff values for the test and determine the sensitivity and specificity for each value in the population. This can then be plotted in a receiver operating characteristic (ROC) curve to help in determining if a good cutoff value can be selected.13,14

Biomarker Rule #4. Focus on Developing a Sensitive and Specific Test, Not on Achieving Statistical Significance. Overfitting or the ‘Omics Trap. Among the advantages often mentioned in the context of using genomics and proteomics to discover and use biomarkers is the ability to test many biomarkers simultaneously. There is a sound basis to the strategy for multiplexing tests in order to improve the value of the test (see article by Gillette et al. in this issue). If each tested biomarker behaves independently as a predictor of disease, then testing many simultaneously provides the opportunity to improve both sensitivity and specificity. However, the combination of multiple tests can also complicate the statistical analysis. This is particularly true when the number of variables measured (genes, proteins, peptides, etc.) approaches or even exceeds the number of individuals in the test population. As the number of variables increases, so does the likelihood of finding results that appear statistically significant by random chance. These false results arise in the form of false predictive patterns or as misleading interpretations of the data to find false individual predictors. Consider a group of 16 random individuals in a room. The flipping of a single coin would clearly not provide a useful predictor of each person’s sex. However, if each individual were to flip sixteen coins and note the outcome of each, a “predictive pattern” could be gleaned from these data that would predict sex for this group. This absurd example is an obvious over-

perspectives simplification, easily dispelled by simply repeating the test, even on the same group. It is easier to be misled by genomics and proteomics data, which include the measurement of real values, usually continuous variables of some kind, from real samples that often detect true differences. The investigator must assess whether these differences reflect the condition that is being tested by the biomarker or other differences between samples (see article by Villanueva et al. in this issue). This is best accomplished by validating any predictive patterns on independent datasets, which must be separated from the discovery set before searching for the pattern.15,16 Most importantly, if predictive patterns are to be used, then it is essential to engage the help of biostatisticians familiar with managing such datasets in order to avoid overfitting the data. A variation on this theme is referred to as false discovery.17,18 In the setting of a high variable to sample number ratio, it is not surprising that some variables will correlate significantly with the condition of interest by random chance. In the coin flipping example, imagine if each individual flipped a thousand coins and it was observed by chance the 125th coin was frequently heads for males and frequently tails for females. It might be incorrectly concluded that the 125th coin was an important marker for sex. Once again, this mistake can be avoided by using the appropriate statistical methods for multiple variables and verifying any discoveries on independent validation sets.

Biomarker Rule #5. Choose the Test and Validation Groups before Searching for Patterns or Biomarkers. Biomarker Rule #6. If the Biomarker is a Proteomic or Gene Expression Pattern, Get a Good Biostatistician. Sample Considerations. Sample Choice. Among the most important considerations in biomarker discovery is determining what samples to use. This will depend on both practical and technical considerations. If the intention is to develop a biomarker for screening purposes, then a priority is placed on easily accessible fluids such as blood, saliva, urine, and possibly CSF. The advantage of these fluids is that they involve minimal risk and cost to obtain. However, relevant biomarkers may be present at very low concentration and masked by much more abundant unrelated proteins (see article by Jacobs et al. in this issue). The use of new instrumentation, new peptide digestion strategies, new protein separation methods and improved software may help to solve some of these challenges (see articles by Gillette et al. and Wu et al. in this issue). In addition to finding rare analytes in blood, there is an increased appreciation that the immune system itself can be used to supply biomarkers that betray the presence and progression of disease (see article by Anderson and LaBaer in this issue) Tissue specimens provide a rich source of relevant analytes, making them useful for biomarker discovery, particularly for basic research applications (see article by Vivanco et al. in this issue). Given the costs and risks of doing biopsies, as a practical matter, the use of tissue for routine clinical testing is usually limited to diseases with significant morbidity and then only if the patient’s diagnosis has been established or is highly suspected (e.g., cardiac biopsies in transplant patients or diagnosis of cancer). Alternatively, imaging studies are also used with increased frequency as biomarkers. New technologies have enabled rich and unprecedented information to be gleaned from such tests (see articles by Reyzer and Caprioli and Chandra et al. in this Journal of Proteome Research • Vol. 4, No. 4, 2005 1057

perspectives issue). Although often less invasive than biopsies, imaging studies are frequently expensive, may be limited to very specialized centers, and may require the intravenous administration of specialized agents for visualization.

Biomarker Rule #7. The Risk and Discomfort Associated with Measuring a Biomarker Must be Outweighed by the Value of its Result. An important real world corollary of this rule recognizes that even relatively safe clinical tests cost money and must be justified.

Corollary 7.a. The Likelihood of Reimbursement for the Costs of a Clinical Test Depends on a Proven Value for the Test in Clinical Decision Making. Sample Number. In the context of biomarker discovery, the decision regarding the number of samples required to obtain a result (positive or negative) depends on the study design, prevalence of disease, and predicted variability within a population. A good biostatistician will help focus the study design and assist with sample number calculations. Clinical Research Considerations. All studies involving people, or samples derived from people, are subject to national, state, and local regulations. In the U. S., samples must be obtained after informed consent of the subject, using IRBapproved clinical protocols by trained investigators. Patient privacy guidelines (HIPAA) must be followed, and all research must conform to ethical standards. Special patient consent for genetic analysis may be required. The more invasive the test (such as a tumor biopsy), the more risk incurred by the patient. Even noninvasive tests, such as MRI’s, involve patient time and discomfort, so that every test, every experiment, must be wellplanned and justified. In addition, laboratory safety guidelines for using human material must conform to OSHA and Biosafety standards, for the protection of laboratory personnel from potential biohazards.

Biomarker Rule #8. Remember, These Samples Come from People. Experiments involving animal models and animal studies offer tremendous power in research for many reasons including lower cost, access to large numbers, more control of variables, and broad tissue access. Extrapolating the results to humans may not be simple. Animal experiments take place in genetically limited backgrounds under tightly controlled conditions. The inherent variability of human populations requires strict attention to potential confounding factors (i.e., smoking changes the levels of many analytes; bacon for breakfast alters serum cholesterol; polymorphisms in metabolizing enzymes alter analyte levels). Samples collected in Boston may not correlate with samples collected in Texas. Not all variables can be predicted or controlled, but repeated analysis of different patient samples, by different laboratories, under different conditions helps to define the extent of variability.

Corollary 8.a. Mice are not People. Sample Processing. The reproducibility of the sample is a major consideration for sample choice. In the case of blood, for example, this can be affected by what time the blood is drawn, how long the sample sits, how it is stored, what type of tube is used and many other factors (see article by Villanueva et al. in this issue). In addition to these factors, it must be 1058

Journal of Proteome Research • Vol. 4, No. 4, 2005

LaBaer Table 3. Considerations for Developing Biomarkers What is the intended use of the biomarker? • Screening for diagnosis, monitoring disease progress, other? • Alone, in parallel to other tests, in series with other tests? • Clinical setting, research setting? How will the biomarker test affect decision making? • Will therapy choices be made based on the test? • Will therapy be initiated based on the test? • Will other tests be ordered? • Will it be used to prognosticate? • Will research decisions be made based on the biomarker test? • Which direction will each outcome of the test point to? What is the target population for the test? • Which disease(s) will be studied? • What are the best control populations? Other populations to consider? E.g., differentiating between colon cancer and benign bowel diseases. • Will the test be used in adults, children, elderly, women, men, other specific groups? • Will the test be used for existing patients, general population, high risk population? What qualities are demanded of the test? • What sensitivity and specificity are needed? • What is the prevalence of the condition in the target population? How will the biomarker test be applied? • To establish the efficacy or safety of treatment? • To eliminate unsuccessful treatments early in the development process? • To lead to a commercial test? What sample will be used? • Easily accessible fluids? E.g., blood, urine, and saliva • Tissue biopsy? Need specialized dissection? • Imaging study? • Does the sample need special processing? • Can this be achieved reproducibly from sample to sample? • What level of skill is required to process the samples?

determined what tissue is sampled and whether specialized techniques should be used, such as laser capture microdissection. If the sample requires processing prior to assay (such as protein separation), then it must be determined how the preparation will affect the reading. Is the preparation method robust and reproducible from sample to sample? Lab to lab? Reagent batch to reagent batch? Is the instrument used to measure the biomarker robust and reproducible? If any are in question, it may be difficult to establish good sensitivity and specificity. One of unique challenges of blood is the exceptional dynamic range from the most abundant to least abundant protein species. Various strategies are employed to process subproteomes with a high likelihood of containing relevant biomarkers, such as proteins that have undergone post-translational modification or focusing on the immune system as tool for biomarker discovery (see article by Anderson and LaBaer in this issue).

Summary Emerging technologies in the fields of genomic and proteomics are offering great promise for the discovery of new and useful biomarkers. Biomarkers, which seek to predict biological states, reach broadly across a variety of biometric measurements to range from complex imaging studies to the assay of analytes in blood. Increasing attention is now focused on the examination of post-translational modification, the assessment of many protein species simultaneously, the immune system and metabalomics as promising areas for biomarker discovery.

perspectives

So, You Want to Look for Biomarkers...

Regardless of the methods used, it is always useful to begin the search for biomarkers by considering how, when and why they will be used (see Table 3).

Acknowledgment. I would like to thank Rebecca Gelman and Xin Lu for excellent teaching and advice on biostatistics and Karen Anderson, Xin Lu and Lyndsay Harris for critical reading of this manuscript. Thanks also to Kerri Kivolowitz for help preparing the manuscript.

(8) (9) (10) (11)

References (1) Baker, M. In biomarkers we trust? Nat. Biotechnol. 2005, 23, 297304. (2) Paik, S.; Shak, S.; Tang, G.; Kim, C.; Baker, J.; Cronin, M.; Baehner, F. L.; Walker, M. G.; Watson, D.; Park, T.; Hiller, W.; Fisher, E. R.; Wickerham, D. L.; Bryant, J.; Wolmark, N. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 2004, 351, 2817-2826. (3) Clinical practice guidelines for the use of tumor markers in breast and colorectal cancer. Adopted on May 17, 1996 by the American Society of Clinical Oncology. J. Clin. Oncol. 1996, 14, 2843-2877. (4) Kasper, D. L.; Braunwald, E.; Fauci, A. S.; Hauser, S. L.; Longo, D. L.; Jameson, J. L.; Isselbacher, K. J. Harrison’s Principles of Internal Medicine, 16th ed.; McGraw-Hill: New York, 2005. (5) Bast, R. C.; Kufe, D. W.; Pollock, R. E.; Weichselbaum, R. R.; Holland, J. F.; III, E. F.; Gansler, T. S.; Society, A. C. Cancer Medicine, e.5 ed.; Hamilton; Ont.; Lewiston; BC Decker: New York, 2000; Vol. Section 29, 103, p 2546. (6) Minton, J. P.; Martin, E. W., Jr., The use of serial CEA determinations to predict recurrence of colon cancer and when to do a second-look operation. Cancer 1978, 42, 1422-1427. (7) Engstrom, P. F.; Benson, A. B., 3rd; Cohen, A.; Doroshow, J.; Kiel, K.; Niederhuber, J.; Roh, M.; Tempero, M. NCCN Colorectal

(12)

(13) (14) (15) (16) (17) (18)

Cancer Practice Guidelines. The National Comprehensive Cancer Network. Oncology (Huntingt) 1996, 10, 140-175. Hernandez, J.; Thompson, I. M. Prostate-specific antigen: a review of the validation of the most commonly used cancer biomarker. Cancer 2004, 101, 894-904. Gaddis, G. M.; Gaddis, M. L. Introduction to biostatistics: Part 3, Sensitivity, specificity, predictive value, and hypothesis testing. Ann. Emerg. Med. 1990, 19, 591-597. United States Public Health Service, Division of Special Health Services, Diabetes program guide. ed.; [n. d.]: [Washington], 1960; ‘Vol.’ 506, p iv., 68 p. Watson, R. A.; Tang, D. B. The predictive value of prostatic acid phosphatase as a screening test for prostatic cancer. N. Engl. J. Med. 1980, 303, 497-499. Porter, D.; Lahti-Domenici, J.; Keshaviah, A.; Bae, Y. K.; Argani, P.; Marks, J.; Richardson, A.; Cooper, A.; Strausberg, R.; Riggins, G. J.; Schnitt, S.; Gabrielson, E.; Gelman, R.; Polyak, K., Molecular markers in ductal carcinoma in situ of the breast. Mol. Cancer Res. 2003, 1, 362-375. Metz, C. E., Basic principles of ROC analysis. Semin. Nucl. Med. 1978, 8, 283-298. Hanley, J. A.; McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143, 29-36. Ambroise, C.; McLachlan, G. J. Selection bias in gene extraction on the basis of microarray gene-expression data. Proc. Natl. Acad. Sci. U. S. A. 2002, 99, 6562-6566. Ransohoff, D. F. Rules of evidence for cancer molecular-marker discovery and validation. Nat. Rev. Cancer 2004, 4, 309-314. Benjamini, Y.; Hochber, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Statist. Soc. B. 1995, 57, 289-300. Dudoit, S.; Shaffer, J.; Boldrick, J. Multiple hypothesis testing in microarray experiments. Stat. Sci. 2003, 18, 71-103.

PR0501259

Journal of Proteome Research • Vol. 4, No. 4, 2005 1059