Drug Target Identification from Protein Dynamics using Quantitative

Mar 25, 2011 - discovered targets, as the related pathways are well documented. Pathway analysis allows for the concurrent identification of drug targ...
0 downloads 0 Views 2MB Size
TECHNICAL NOTE pubs.acs.org/jpr

Drug Target Identification from Protein Dynamics using Quantitative Pathway Analysis David M. Good and Roman A. Zubarev* Chemistry I, Department of Molecular Biochemistry and Biophysics, Karolinska Institutet, Scheeles v€ag 2, 171 77 Stockholm, Sweden

bS Supporting Information ABSTRACT: Dynamic proteomics promises to greatly facilitate identification of target proteins for drug molecules. Cohen et al. [Science, 2008, 322 (5907), 15111516] illustrated this potential, with the responses of 812 fluorescently tagged proteins to camptothecin administration monitored over 48 h. Directly from this data, one can restrict the list of candidate targets to 52 proteins. However, this approach has numerous limitations: equipment, labor (tagging and analyzing g1 colony/protein), and data analysis (aggregating individual cell data into population-relevant data sets). Furthermore, analytical success requires both explicit knowledge of drug target time-course evolution and, most importantly, monitoring of the target, itself. To address these issues, we developed a quantitative pathway analysis (qPA) technique, which employs well-annotated signaling pathways and elucidates putative drug targets and other molecules of interest. qPA, using more general assumptions and only 3 out of 144 available time points, identified the single known camptothecin target, TOPI, among only a handful of putative targets. Importantly, identification was possible without containing TOPI within the input data. These results demonstrate the potential of qPA in drug target discovery and highlight the importance of systems biology approaches for analysis of proteomics data. KEYWORDS: AUTHOR, PLEASE, PROVIDE, KEYWORDS

’ INTRODUCTION In spite of recent technological advances, drug discovery has remained an exceedingly expensive venture, with research and development costs for each new molecular entry (NME) estimated at $1.8 billion.2 In addition, the rate of successful drug development has actually decreased in recent years, with 50% fewer NMEs gaining approval over the past 5 years versus the preceding 5 years.3 One of the main issues with the current drug development process is the large majority of drugs failing during phase II and III trials,2 meaning a large expenditure for the pharmaceutical company as human testing is involved. To counter this obvious deficiency in the current scheme, Paul et al. argue for a shift in the current development process, with a greater focus on preclinical development—the “quick win, fast fail” scheme.2 Perhaps the largest change that this method calls for is the shift from the current bare minimum of drug discovery to a much greater amount of resources put into the area of research. This approach would address one of the current bottlenecks in drug development, which is not only the identification of the target molecules for drug molecules but also characterization of its mechanisms of action and surrounding pathways. This may also lead to more quickly integrating newly discovered targets, as the related pathways are well documented. Pathway analysis allows for the concurrent identification of drug targets as well as providing a view onto the surrounding biological processes that are most affected by varying the target’s abundance, activity, etc. r 2011 American Chemical Society

Dynamic proteomics promises to greatly facilitate this process. Recently, Cohen et al. employed this technique to study human cancer cell response to administration of the anticancer drug camptothecin.4 The abundances and localization of ∼1000 CDtagged proteins5 were monitored every 20 min using fluorescence microscopy over a period of 48 h following administration of the anticancer drug camptothecin. The only known camptothecin target, the enzyme Topoisomerase I (TOPI), was found among the first-responding and most-regulated proteins, though it could not be pinpointed as the drug target. Though the cited study was able to identify the drug target within the top 1% of first-responding proteins, the restriction of putative drug targets was only possible when the a priori knowledge of the drug target dynamics was employed in clustering analysis. Moreover, restriction of the original protein list of >1200 proteins to more manageable 812 proteins was done based on the protein localization analysis. Therefore, the applied approach is costly in terms of equipment (fluorescence-enabled microscope with automation software and hardware), labor (tagging and analyzing a minimum of 1 colony/protein, with more colonies necessary to establish confidence levels), and data analysis (aggregating individual cell data into population-relevant data sets). In general, massive protein tagging is not conducive to high-throughput screening methodology and is therefore a less attractive solution for pharmaceutical research. Most importantly, analytical success Received: January 31, 2011 Published: March 25, 2011 2679

dx.doi.org/10.1021/pr200090m | J. Proteome Res. 2011, 10, 2679–2683

Journal of Proteome Research

TECHNICAL NOTE

Figure 1. Flow-diagram for quantitative pathway analysis (qPA). (I) Proteomics data set is mapped onto the network of molecules inherent to the organism (input proteins highlighted in black). (II) Beginning from the input proteins, this network is searched upstream for key nodes (bottleneck molecules; highlighted in green). KN scores are then calculated and compared for different time points. (III) Fastest- and largest-changing KNs (purple) are selected for subsequent downstream analysis. The overlapping molecules (within red oval) are considered viable drug target candidates. As shown, one of the two candidates was not included within the original input data; this demonstrates the ability of qPA to elucidate drug targets which were not observed/monitored in the proteomics experiment.

require monitoring of the target itself, while only a small percentage of the total expressed proteome is monitored in this type of proteomics experiment, inherently limiting the list of candidates in the direct analysis to observed proteins. Here, we demonstrate that supplementing direct dynamics monitoring with quantitative pathway analysis (qPA) of dynamic proteomics data alleviates these deficiencies and identifies TOPI as one of only a handful of putative targets of camptothecin. In addition, qPA elucidates TOPI as one of only a few candidates even when it is not observed within the original analysis, and thus not present in the input list.

’ MATERIALS AND METHODS Input Data

The protein dynamics data from fluorescence measurements of Cohen et al.4 were downloaded from the Kahn Dynamic Proteomics Project; http://www.weizmann.ac.il/mcb/UriAlon/ DynamProt/papers.html. From the .mat files, a single .xls was created containing protein identifiers and associated fluorescence values for each time point (02800 min). Key Node Analysis

The data were loaded into the ExPlain pathway search tool (BioBase GmbH, Germany). Protein identifiers were automatically converted to gene identifiers. Key nodes (KNs, regulatory molecules found at intersections of signaling pathways) were searched in the upstream environ of the input genes (Figure 1); each KN is assigned a score based on its connectivity and input protein abundances.6 The KN analysis settings were: distance tolerance 4, penalty 8, following curated chains, using the fluorescence values as weighting factors. For each KN, the score changes with time were calculated as ΔS values representative of both the fold- and absolute changes: ΔS ¼ ðSA  SB Þ  logðSA =SB Þ

ð1Þ

Note that ΔS is positive regardless of the direction of abundance change. Two types of analysis were performed: speed Δ(060) and magnitude Δ(02800). For each type of analysis, the KNs were ranked by their ΔS values, and the top KN was selected (Figure 1, II). From the two top-scoring molecules, downstream KN searches were performed (Figure 1, III) with the same parameters (distance 4, penalty 8) as for the upstream search. The overlap between the two lists of downstream KNs represented both the fastest and largest cumulative changes and thus the overlapping proteins were taken as possible drug targets.

’ RESULTS AND DISCUSSION We employed the previously developed qPA algorithm6 with only slight modification. Observed proteins were mapped onto known pathways via an upstream search (abundances as weighting factors), identifying regulatory molecules. For comparison with qPA, protein dynamics were analyzed directly using eq 1 (relative protein abundances replacing KN scores). The target, TOPI, was the 502nd fastest- (147th fastest decreasing) and 89th largest-changing protein (84th most decreasing) (Figure 2A). When combining speed and magnitude, TOPI was ranked 233rd, though 52nd among proteins decreasing both over the first 60 min and over the entire analysis. In contrast, the two top KNs in qPA gave only 19 overlapping proteins from the input protein list, with TOPI as the 7th most likely target when known dynamics is considered (Figure 2B). What if TOPI was not among the input list proteins? Removing TOPI from the input list renders it invisible for direct analysis. The qPA technique, however, involves searching downstream over the entire proteome and thus does not have this limitation. To test this unique qPA ability, analysis was repeated without TOPI in the input list. This analysis resulted in 163 candidates, not merely including proteins with known dynamics, but all proteins known to be downstream of the respective key nodes, including TOPI. Now the dynamics of these 169 protein 2680

dx.doi.org/10.1021/pr200090m |J. Proteome Res. 2011, 10, 2679–2683

Journal of Proteome Research

TECHNICAL NOTE

Figure 2. (A) Direct analysis (B) qPA (C) qPA excluding the target, TOPI. To differentiate between rapid and cumulative changes, analyses were performed comparing time 0 with 60 and 2800 min. The protein TOPI is highlighted in (A), while the top-scoring key nodes used for further analysis are highlighted in (B) and (C).

Figure 3. KN score changes (ΔS) in upstream key node analyses for speed (horizontal axis) and magnitude (vertical axis). The KNs with the greatest changes in either analysis are highlighted and labeled. Individual downstream KN analyses were then performed for each of these molecules. The upstream KN analysis started from: (A) Full protein data set from (1); (B) same but excluding TOPI; (CE) random removal from the input list of 10% (C), 20% (D), and 50% (E) of proteins; (F) removal of the 100 most rapidly and/or largest cumulatively changing proteins.

candidates can be measured in a targeted analysis. This analysis would rank TOPI as the ninth most likely putative target (Figure 2C). To test robustness of this approach, additional proteins were randomly removed from the input list prior to analysis (Figure 3). Removing 10% yielded the same two top KNs as with the full input list (Figure 3C), and thus the same candidates (including TOPI). When removing 20% of the input list (Figure 3D), TOPI could not be found using this approach, as it was not contained within the hits list for the highest-scoring KN from the ΔS02800 analysis (though the highest-scoring KN from the ΔS060 analysis did contain TOPI). However, the second-ranked KN in ΔS02800, fibrinogen gamma, did contain TOPI within its hits list, showing the possibility to tune this method by increasing or decreasing the number of top-scoring

key nodes whose corresponding hits lists will be used to generate putative drug candidates. Similarly to the 20% results, the fastest-changing KN in the 50% analysis contained TOPI (Figure 3E), yet the KN exhibiting the greatest magnitude change did not. Finally, the 100 highest-scoring proteins from the ΔS060 and ΔS02800 lists (182 proteins, including TOPI) were removed from the input (Figure 3F), with TOPI still identified as the ninth highest-scoring candidate, though now among 219 in total. The above results indicate that although qPA specificity is reduced when input proteins are removed, a significant amount of signal remains in the input and can be extracted by pathway analysis. To investigate the importance of the quantitative information for pathway analysis, the input list of protein identifiers was scrambled, leaving the fluorescence values paired with a random protein identifier. If the abundance values 2681

dx.doi.org/10.1021/pr200090m |J. Proteome Res. 2011, 10, 2679–2683

Journal of Proteome Research

TECHNICAL NOTE

dynamic proteomics data. With sufficient information present in the data sets, drug target elucidation should be fast and efficient. In addition, though qPA was originally developed for massspectrometry data,6 its application here to fluorescence input demonstrates its universality. These results confirm the potential of qPA in drug target discovery and highlight the importance of systems biology approaches for analysis of proteomics data.

’ ASSOCIATED CONTENT

bS

Supporting Information Supplementary tables containing the results of both up- and downstream searches from BioBase. This material is available free of charge via the Internet at http://pubs.acs.org.

’ AUTHOR INFORMATION Corresponding Author

*E-mail: [email protected]. Tel: þ46 08 524 87594.

Figure 4. Visualization of the downstream pathways to the target, Topoisomerase, from the two top-scoring key nodes in the original analysis: (I) fastest-changing key node, c-FLIP; (II) greatest magnitude change, Fibrinogen gamma. Note the complete documentation for these pathways, with both cellular event/process and correlating reference(s).1

were inconsequential to the analysis, that is, knowledge of protein dynamics was unnecessary for TOPI elucidation as the drug target, then this scrambling should have no deleterious effect upon the analysis. As expected, however, upon performing the same procedure as outlined above, neither of the key nodes found from downstream searching contained Topoisomerase within their hits list. This result shows the critical role of input protein abundances for drug target elucidation. In addition to providing quantitative information about key nodes and other molecules of interest, qPA also supplies a view onto the molecular pathways linking the observed proteins together. Figure 4 illustrates just one of the many possibilities to glean extra information from these analyses, showing the pathways connecting the fastest-changing (Figure 4, I) and largest-changing (Figure 4, II) KNs to the drug target, TOPI (qPA performed on the original input list), thus providing a simple visual descriptor of the processes driving the observed phenotypic response. Additionally, when using one of the many well-curated, web-based tools available for performing this analysis (multiple free and pay-for-use options currently exist), one is also able to directly link to those articles within the library which were used for curation of that particular molecule or reaction, therefore aiding in the development of more targeted follow-up studies.

’ CONCLUSIONS qPA, using more general assumptions about drug target dynamics and only three of the 144 available time points, identified the single known camptothecin target, the enzyme Topoisomerase I (TOPI), among only a handful of putative targets. Importantly, identification was possible even without TOPI being directly observed within the input data. Our results confirm the potential of qPA for knowledge-based filtering of

’ ACKNOWLEDGMENT D.M.G. is grateful for support from a Wenner-Gren postdoctoral fellowship. This work was supported through grants from the Swedish Research Council (grant 2007-4410 to R.Z.). ’ REFERENCES (1) (a) Liu, H.; Chang, D. W.; Yang, X. Interdimer processing and linearity of procaspase-3 activation. A unifying mechanism for the activation of initiator and effector caspases. J. Biol. Chem. 2005, 280 (12), 11578–82. (b) Chang, D. W.; Xing, Z.; Pan, Y.; AlgecirasSchimnich, A.; Barnhart, B. C.; Yaish-Ohad, S.; Peter, M. E.; Yang, X. c-FLIP(L) is a dual function regulator for caspase-8 activation and CD95-mediated apoptosis. EMBO J. 2002, 21 (14), 3704–14. (c) Irmler, M.; Thome, M.; Hahne, M.; Schneider, P.; Hofmann, K.; Steiner, V.; Bodmer, J. L.; Schroter, M.; Burns, K.; Mattmann, C.; Rimoldi, D.; French, L. E.; Tschopp, J. Inhibition of death receptor signals by cellular FLIP. Nature 1997, 388 (6638), 190–5. (d) Tschopp, J.; Irmler, M.; Thome, M. Inhibition of fas death signals by FLIPs. Curr. Opin. Immunol. 1998, 10 (5), 552–8. (e) Barila, D.; Rufini, A.; Condo, I.; Ventura, N.; Dorey, K.; Superti-Furga, G.; Testi, R. Caspase-dependent cleavage of c-Abl contributes to apoptosis. Mol. Cell. Biol. 2003, 23 (8), 2790–9. (f) Yu, D.; Khan, E.; Khaleque, M. A.; Lee, J.; Laco, G.; Kohlhagen, G.; Kharbanda, S.; Cheng, Y. C.; Pommier, Y.; Bharti, A. Phosphorylation of DNA topoisomerase I by the c-Abl tyrosine kinase confers camptothecin sensitivity. J. Biol. Chem. 2004, 279 (50), 51851–61. (g) Yokoyama, K.; Erickson, H. P.; Ikeda, Y.; Takada, Y. Identification of amino acid sequences in fibrinogen gamma -chain and tenascin C C-terminal domains critical for binding to integrin alpha vbeta 3. J. Biol. Chem. 2000, 275 (22), 16891–8. (h) Arias-Salgado, E. G.; Lizano, S.; Sarkar, S.; Brugge, J. S.; Ginsberg, M. H.; Shattil, S. J. Src kinase activation by direct interaction with the integrin beta cytoplasmic domain. Proc. Natl. Acad. Sci. U.S.A. 2003, 100 (23), 13298–302. (i) Plattner, R.; Kadlec, L.; DeMali, K. A.; Kazlauskas, A.; Pendergast, A. M. c-Abl is activated by growth factors and Src family kinases and has a role in the cellular response to PDGF. Genes Dev. 1999, 13 (18), 2400–11. (2) Paul, S. M.; Mytelka, D. S.; Dunwiddie, C. T.; Persinger, C. C.; Munos, B. H.; Lindborg, S. R.; Schacht, A. L. How to improve R&D productivity: the pharmaceutical industry’s grand challenge. Nat. Rev. Drug Discovery 2010, 9 (3), 203–14. (3) Mathieu, M. P. Parexel’s Bio/Pharmaceutical R&D Statistical Sourcebook 2008/2009; Parexel International Corporation: Waltham, MA, 2008. (4) Cohen, A. A.; Geva-Zatorsky, N.; Eden, E.; Frenkel-Morgenstern, M.; Issaeva, I.; Sigal, A.; Milo, R.; Cohen-Saidon, C.; Liron, Y.; Kam, Z.; 2682

dx.doi.org/10.1021/pr200090m |J. Proteome Res. 2011, 10, 2679–2683

Journal of Proteome Research

TECHNICAL NOTE

Cohen, L.; Danon, T.; Perzov, N.; Alon, U. Dynamic Proteomics of Individual Cancer Cells in Response to a Drug. Science 2008, 322 (5907), 1511–1516. (5) Jarvik, J. W.; Adler, S. A.; Telmer, C. A.; Subramaniam, V.; Lopez, A. J. CD-tagging: A new approach to gene and protein discovery and analysis. Biotechniques 1996, 20 (5), 896–&. (6) Zubarev, R. A.; Nielsen, M. L.; Fung, E. M.; Savitski, M. M.; Kel-Margoulis, O.; Wingender, E.; Kel, A. Identification of dominant signaling pathways from proteomics expression data. J. Proteomics 2008 71 (1), 89–96.

2683

dx.doi.org/10.1021/pr200090m |J. Proteome Res. 2011, 10, 2679–2683