Looking for Thom's Biomarkers with Proteomics - Journal of Proteome

Looking for Thom’s Biomarkers with Proteomics Andrzej K. Drukier,†,‡ Ivan Grigoriev,† Larry R. Brown,‡ John E. Tomaszewski,§ Richard Sainsbury,| and Jasminka Godovac-Zimmermann*,⊥ BioTraces Inc., 12455 Sunrise Valley Dr., Herndon, Virginia 20171, OncoTraces Ltd, 3 George Street, London, W1U 3QG, United Kingdom, Department of Pathology and Laboratory Medicine, University of Pennsylvania, 3400 Spruce Street, Philadelphia, Pennsylvania 19104, Department of Surgery, University College London, 74 Huntley Street, London WC1E 6AU, United Kingdom, and Department of Medicine, Rayne Institute, University College London, 5 University Street, London WC1E 6JF, United Kingdom Received May 15, 2006

In recent years, large numbers of putative disease biomarkers have been identified. Combinations of protein biomarkers have been proposed to overcome the lack of single, magic-bullet identifiers of disease conditions. The number of biomarkers in a panel must be kept small to avoid the combinatorial explosion that requires very large, uneconomical sample cohorts for validation. Recent results on high sensitivity blood-based diagnostic proteomics (Godovac-Zimmermann, J et al., J. Proteome Res. 2006) suggest that the keys to identifying useful panels include judicious application of physiological knowledge to choose appropriate combinations of local, tissue/disease markers and global, systemic markers and to use very high sensitivity protein detection. Biomarkers that show non-Gaussian landscapes reminiscent of Rene Thom’s multiple, stable-state landscapes seem to have the greatest predictive value for breast cancer (Godovac-Zimmermann, J. et al., J. Proteome Res. 2006). Keywords: breast cancer • proteomics • immunoassay • blood • biomarkers • multiphoton detection

A few years ago, the number of known biomarkers of disease was very small and almost all of these biomarkers were high abundance proteins.1 It was believed that these biomarkers should show Gaussian distributions over patient cohorts, either healthy or diseased, and that the average and/or the width of the Gaussian distributions should change between healthy and disease populations. These first biomarkers were not terribly successful and generally had sensitivity (correct detection of disease) and selectivity (correct rejection of healthy) of no better than 70%. With the arrival of new methodologies that allowed screening of larger numbers of potential biomarkers, the hope was that magic-bullet biomarkers with much higher sensitivity and selectivity would be found. At the same time, the search for biomarkers was widened to include RNA, DNA, polysaccharides, and even small molecules. In the following, we limit our comments to diagnostic proteomics, although we believe that many of the considerations also apply to the other potential classes of biomarkers. This situation has changed completely in recent years and hundreds of putative protein disease biomarkers have now been proposed, e.g., for cancer at least 80 different proteins have been proposed as potential biomarkers (at least 12 for breast cancer).2,3 Lack of detection sensitivity means that most of these new biomarkers are still high abundance proteins. A * To whom correspondence should be addressed. [email protected]. † BioTraces Inc. ‡ OncoTraces Ltd. § University of Pennsylvania. | Department of Surgery, University College London. ⊥ Rayne Institute, University College London.

2046

Journal of Proteome Research 2006, 5, 2046-2048

Published on Web 07/15/2006

E-mail:

common sentiment has been that with hundreds of thousands of proteins in typical eukaryotic proteomes (including transcriptional and post-translational isoforms), it should be possible to find good biomarkers among the “low-hanging fruit”. By now, it is clear that this expectation was not fruitful. Rather than the hoped for magic bullets, what has been found is a plethora of new “70% successful” biomarkers.4,5 Indeed, although this innovation has led to massive numbers of new, putative protein biomarkers, the number of early detection procedures based on biomarkers that have been approved by the FDA actually diminished in the last 5 y.6,7 Much current investigation is now directed at finding panels of different biomarkers that in combination will improve diagnostic sensitivity and selectivity, but the projected use of combinations of different biomarkers presents new challenges. If we choose to use a subset of N biomarkers chosen at random from a potentially much larger set M of proposed biomarkers, then the panel could be chosen in M!/(M-N)!N! different ways. Assuming independent variables and Gaussian statistics, if a single biomarker can be validated with S samples, then for a biomarker panel with N biomarkers the number of samples needed to validate the panel increases roughly with SN, i.e., tends to become prohibitively large. At the same time, if each single biomarker can only be measured for a fraction F of the samples, then the fraction of the samples for which all biomarkers in the panel can be measured decreases roughly as FN, i.e., there are fewer and fewer patients for which all of the biomarkers can be measured. The number of biomarkers in the panel might be increased so that some useful combination will be available for all patients, but this again increases 10.1021/pr060231q CCC: $33.50

 2006 American Chemical Society

Diagnostic Proteomics with Biomarkers

letters

Figure 1. Two-dimensional landscapes produced by wavelet processing for biomarker pairs. Ultrasensitive IA/MPD and Super-ELISA immunoassays were used to measure the abundances of the biomarkers in the blood of 95 healthy women (HW) and 159 breast cancer (BC) patients (5, 6). (A) Distribution of the biomarker pair IL-6/IL-8. (B) Distribution of the biomarker pair VEGF/PSA.

the number of samples needed for validation. For more than a handful of biomarkers, these are daunting numbers that suggest that randomized, high throughput searches for combinations of biomarkers that can be adequately validated with Gaussian statistics may not be an efficient way to proceed. The historical legacies that guided us in the days when biomarkers were rare need to be updated. We have been able to find panels with small numbers of biomarkers (5-11) that show high promise for blood-based diagnosis of breast cancer9 as well as for other cancers (prostate, ovarian, melanoma) and diseases (e.g., Alzheimers and other neurodegenerative diseases) (preliminary results). On the basis of this experience, in the following, we outline three concepts that we have found useful in finding effective protein biomarker panels that can be used for diagnostic purposes. Sensitivity of Detection. This is absolutely critical. It is essential that the individual proteins can be measured for all samples in both the healthy and disease cohorts to keep the number of biomarkers in the panel down to numbers that can be feasibly validated. In the companion paper, we show that particular combinations of proteins may only have good predictive value if they can be measured for all samples.9 At the same time, this guarantees that the panel is actually applicable to a high percentage of patients. Furthermore, the better the detection sensitivity, the greater the flexibility in choosing which protein biomarkers to include in the panel. This can be especially important for diagnostic panels aimed at bodily fluids (blood, urine, breath condensate, etc.), where many potentially interesting proteins may have very low abundance. The 100- to 1000-fold improvements in the limits of detection that we have been able to achieve with new ultrasensitive IA/MPD and Super-ELISA immunoassay methods8 are crucial for measuring low abundance serum proteins such as cytokines and angiogenic factors that we have found to have high predictive value for breast and other cancers.9 Validation. One of the crucial results obtained with the ultrasensitive immunoassays is the demonstration that some of the most useful biomarkers do not show Gaussian distributions of abundance for either healthy or disease cohorts (ref 9, Figure 1 from ref 8). Current, common statistical methods for validation are usually used when each biomarker is shown to conform to Gaussian distributions, but this may exclude some of the most informative biomarkers. We think that the nonGaussian distributions are to be expected and are highly useful. Many years ago, Rene Thom proposed that there may be

multiple, separate stable states in complex biological systems.10 We have found that the distributions in blood of some biomarker pairs show separate “islands” or “continents” for both healthy and disease cohorts that are reminiscent of the landscapes proposed by Thom (Figure 1). A surprising outcome of our results is that it was these types of landscapes that were most useful in detection of breast and other cancers.9 Combinations of higher abundance proteins that showed Gaussian distributions in blood had much less predictive value. The complex landscapes that have been observed required the development of new correlation methods to characterize distributions of pairs of biomarkers.9 We have consciously limited the landscapes to two dimensions to avoid the types of “combinatorial explosions” noted above. Initial indications are that such two-dimensional landscapes can be validated for use in diagnostic biomarker panels with cohorts of surprisingly limited size. We think that this occurs because the landscapes for appropriate markers reflect exactly the kinds of multiple, stable states proposed by Thom. Choice of Biomarkers. In many cases, we now have a glut of proposed biomarkers. Not all of these propositions are of equal quality since many could not be measured for complete sample cohorts and/or have been measured only for very small cohorts. Alternative, high throughput methods for biomarker discovery such as transcriptomics on cell systems often seem to have limited predictive value. This is presumably because of issues related to biomarker transfer to and elimination from fluids such as blood, and/or because cell systems are not good models for behavior of integrated, organismal systems. Nonetheless, in many cases the critical issue seems not to be screening for yet more putative biomarkers. The real challenge is now to identify highly predictive combinations of small numbers of already known protein biomarkers, even though the screening methods that have been in common use may require prohibitive numbers of patient samples for validation and/or may not have real statistical validity (non-Gaussian distributions). Our approach to culling high numbers of putative biomarkers has been a combination of knowledge about cell biology and physiology, high-throughput screening at limited cohort coverage for selections of physiologically sensible biomarkers and very high sensitivity immunoassay at complete cohort coverage for small numbers of highly promising biomarkers. From physiology, we take the concept that many diseases are local phenomena within an organism, but that there are systemic responses at the whole organism level that are Journal of Proteome Research • Vol. 5, No. 8, 2006 2047

letters reflected in bodily fluids such as blood. Sensitivity to a specific disease requires the use of “local” tissue/disease markers in a diagnostic panel. Secreted proteins that readily enter blood are the obvious choice for this class of markers. On the other hand, there are systemic responses to disease, e.g., angiogenesis and immunological alterations are both important in organismal response to cancer. Systemic markers such as growth factors, angiogenic factors and cytokines are endemic in blood and may improve selectivity, but their very low concentrations require very high sensitivity assays for their measurement. Highthroughput screening of a selection of tissue/disease and systemic markers that have physiological sense provides a limited set of promising proteins for a given disease. Subsequently, very high sensitivity immunoassay provides confirmation of which combinations of biomarkers have high predictive value.9 Evaluation of the biomarker combinations using the types of correlation methods we have developed9 avoids the clear inadequacies of Gaussian statistics and allows taking advantage of apparent “Thom” biomarkers in a form that itself is already appropriate for application as a diagnostic tool. In short, it appears that despite potentially daunting problems with the uneconomic/unmanageable cohort sizes needed for discovery and validation of biomarker panels, the high granularity of live human beings (organs, tissues, cell types, etc.) and of their systemic responses permits the discovery of efficient panels with limited numbers of biomarkers.9 Both

2048

Journal of Proteome Research • Vol. 5, No. 8, 2006

Drukier et al.

high-throughput proteomics screening and very high sensitivity proteomics detection and validation have crucial roles to play.

References (1) Rosenblatt, K. P.; Bryant-Greenwood, P.; Killian, J. K.; Mehta, A.; Geho, D.; Espina, V.; Petricoin, E. F. 3rd, Liotta, L. A. Annu. Rev. Med. 2004, 55, 97-112. (2) Rai, A. J.; Chan, D. W. Ann. N.Y. Acad. Sci. 2004, 1022, 286-294. (3) Molina, R.; Barak, V., van Dalen, A.; Duffy, M. J.; Einarsson, R.; Gion, M.; Goike, H.; Lamerz, R.; Nap, M.; Soletormos, G.; Stieber, P. Tumour Biol. 2005, 26, 281-293. (4) Ross, J. S.; Linette, G. P.; Stec, J.; Clark, E.; Ayers, M.; Leschly, N.; Symmans, W. F.; Hortobagyi, G. N.; Pusztai, L. Expert Rev. Mol. Diagn. 2003, 3, 573-585. (5) Ross, J. S.; Linette, G. P.; Stec, J.; Clark, E.; Ayers, M.; Leschly, N.; Symmans, W. F.; Hortobagyi, G. N.; Pusztai, L. Expert Rev. Mol. Diagn. 2004, 4, 169-188. (6) http://www.fda.gov/oc/initiatives/criticalpath/reports/opp_list. pdf. (7) http://www.fda.gov. (8) Drukier, A. K.; Ossetrova, N.; Schors, E.; Brown, L. R.; Tomaszewski, J.; Sainsbury, R.; Godovac-Zimmermann, J. J. Proteome Res. 2005, 4, 2375-2378. (9) Drukier, A. K.; Ossetrova, N.; Schors, E.; Krasik, G.; Grigoriev, I.; Koenig, C.; Sulkowski, M.; Holzman, J.; Brown, L. R.; Tomaszewski, J. E.; Schnall, M. D.; Sainsbury, R.; Lokshin, A. E.; GodovacZimmermann, J. J. Proteome Res. 2006, 5, 1906-1915. (10) Thom, R. In Structural Stability and Morphogenesis: An Outline of a General Theory of Models; Westview Press: Boulder, CO, 1972.

PR060231Q

Looking for Thom's Biomarkers with Proteomics - Journal of Proteome

Recommend Documents