The Chromosome-Centric Human Proteome Project: A Call to Action

Dec 21, 2012 - The member companies are in a unique position to develop hardware and ... CAPER 3.0: A Scalable Cloud-Based System for Data-Intensive ...
0 downloads 0 Views 165KB Size
Perspective pubs.acs.org/jpr

The Chromosome-Centric Human Proteome Project: A Call to Action Andreas F. R. Hühmer,*,† Aran Paulus,*,‡ LeRoy B. Martin,§ Kevin Millis,∥ Tasha Agreste,∥ Julian Saba,† Jennie R. Lill,⊥ Steven M. Fischer,# William Dracup,¶ and Paddy Lavery¶ †

Thermo Fisher Scientific, Life Science Mass Spectrometry, San Jose, California 95134, United States Bio-Rad Laboratories, Life Science Group, San Jose, California 95126-2423, United States § Waters Corporation, Beverly, Massachusetts 01915, United States ∥ Cambridge Isotope Laboratories, Andover, Massachusetts 01810, United States ⊥ Genentech Inc., 1 DNA Way, South San Francisco, California 94080, United States # Agilent Technologies, Santa Clara, California 95051, United States ¶ William Dracup, Nonlinear Dynamics Ltd., Keel House, Newcastle-upon-Tyne, NE1 2JE, England, United Kingdom ‡

ABSTRACT: The grand vision of the human proteome project (HPP) is moving closer to reality with the recent announcement by HUPO of the creation of the HPP consortium in charge of the development of a two-part HPP, one focused on the description of proteomes of biological samples or related to diseases (B/D-HPP) and the other dedicated to a systematic description of proteins as gene products encoded in the human genome (the C-HPP). This new initiative of HUPO seeks to identify and characterize at least one representative protein from every gene, create a protein distribution atlas and a protein pathway or network map. This vision for proteomics can be the roadmap of biological and clinical research for years to come if it delivers on its promises. The Industrial Advisory Board (IAB) to HUPO shares the visions of C-HPP. The IAB will support and critically accompany the overall project goals and the definitions of the critical milestones. The member companies are in a unique position to develop hardware and software, reagents and standards, procedures, and workflows to ensure a reliable source of tools available to the proteomics community worldwide. In collaboration with academia, the IAB member companies can and must develop the tools to reach the ambitious project goals. We offer to partner with and challenge the academic groups leading the CHPP to define both ambitious and obtainable goals and milestones to make the C-HPP a real and trusted resource for future biology. KEYWORDS: Human Proteome Project, Human Proteome Organization, industrial advisory board, large-scale science, quantitative science, Human Genome Project, protein detection, proteome complexity, proteomics funding



THE HPP, AN UNCERTAIN START The Human Proteome Organization (HUPO) was founded on February 9, 2001, just a few days before the initial publication of the human genome.1 The mission of HUPO is: To define and promote proteomics through international cooperation and collaborations by fostering the development of new technologies, techniques and training to better understand human disease.1 The vision of the Human Proteome Project (HPP) already existed at HUPO’s inception, yet almost a decade passed before the announcement of the HPP in Sydney, Australia in September 2010,2,3 which is composed of the Chromosomecentric (C-HPP) and the Biology/Disease-driven HPP (B/DHPP) and its official launch in Geneva, Switzerland in September 2011. During the decade that preceded those announcements, efforts were focused on creating proteome maps of individual organs and finding specific protein changes associated with diseases of major importance.4−6 Some of these efforts were very successful. For example, the first draft of the Chinese Human Liver Proteome Project was completed in 2009.7 It demonstrated the power of a concerted © 2012 American Chemical Society

effort to understand the complexity of proteomes on an organ level. At the same time, the resulting large set of data provided many new insights but revealed the complexity associated with understanding the proteome of a human organ. It provided the community with valuable insight on the limitations of current protein detection technologies and highlighted the challenges associated in organizing, standardizing and interpreting highquality protein data sets. Other efforts that also involved substantial governmental and industrial resources were less successful. Overly ambitious promises to deliver disease-specific biomarkers and therapeutic targets largely did not materialize. While leaving some supporters of proteomics disappointed or discouraged, such disappointments have encouraged others to refocus and search for improved ways to succeed. It is also important to realize that the challenge of proteomics is enormous and much larger Special Issue: Chromosome-centric Human Proteome Project Received: October 5, 2012 Published: December 21, 2012 28

dx.doi.org/10.1021/pr300933p | J. Proteome Res. 2013, 12, 28−32

Journal of Proteome Research

Perspective

expressed in normal cells or tissues and whether current highthroughput proteomics techniques are capable of detecting those missing proteins. The complexity, dynamic nature, and unknown depth of the human proteome are suspected to be enormous. Estimates are that the human proteome could give rise to about 1 000 000 potential protein forms including basic sequence variants, post-translational modifications, and in vivo cleavage products.14 Acquiring this information will require tools and validated methods that the proteomics community is poised to deliver. With the benefit of insights that have emerged from genomics research, leaders in that field are now acknowledging that genomics alone is not enough.15 Genome biology needs to progress to cell biology and physiology with understanding of the linkages between genetic perturbations and the dysregulation of proteins downstream. Two decades of genomics has generated many potential disease targets and promising leads that await validation by proteomics. It makes sense to provide genomic scientists and medical researchers who know how to navigate the genome with tools to drill down to validated protein-level information. Systematically organizing data using a gene-centric approach, as outlined in the C-HPP, is also a logical consequence of the divide-and-conquer strategy the proteomics community has adopted. The strategy is a practical approach to organizing the project and a method of driving clear responsibilities with standardized processes. It is not a philosophical capitulation to a gene-centric world view or a commitment to an omics-only, hypothesis-free research perspective.8 We, the IAB of HUPO, see the C-HPP at a critical stage with a number of synergies emerging. New insights are rapidly gained from the early large-scale protein expression studies. Protein detection technology continues to become more powerful and easier to use. The proteomics community and its industrial partners must recognize the convergence of these positive factors and collaborate now to complete the C-HPP and evolve the larger vision of the HPP. The IAB is articulating its support of the C-HPP in this special issue of Journal of Proteome Research (JPR). Recognizing the complexity of the human proteome, the CHPP has set a goal of 10 years for the completion of the project.16 We think that is an insufficiently ambitious goal that supports the perspective of some critics that the C-HPP is an unworthy project. The C-HPP should press on with the support of industry and complete a draft of the C-HPP well within the stated goal of 10 years, using bottom-up and topdown proteomics methods as well as antibodies.

than initial goal of sequencing the human genome. This lead to the realization that proteomics needed an effective way of organizing the necessary global resources as well as developing a clear vision of what will constitute “success”.



THE C-HPP MISSION A group of researchers within HUPO have now proposed an internationally coordinated, systematic chromosome-based human proteome project (C-HPP) as a first phase of the larger HPP.8 The C-HPP seeks to: Complete a highly accurate validation of the human proteome complement that provides a high quality, complete reference proteome data set. This includes: 1. Identification and characterization of at least one representative protein with its abundance and major modifications from every human gene. 2. Creation of a protein distribution atlas of these proteins, including their subcellular localization. 3. A protein pathway and network map representing the interactome. The project is also attempting to organize the information into a genomic framework, develop a standardized data analysis approach and quality control, and provide access to this information to everyone. Since the official launch of the HPP in Geneva, the current gene-centric C-HPP has gained momentum. A full set of the 24 human chromosomes have been “adopted” by teams around the world. Teams from Asia and Australia are taking the lion’s share of the responsibility with 12 chromosomes, followed by the EU member states (6 chromosomes), and the Americas contributors (6 chromosomes), including one from Brazil, South America. Not everyone in the field has rallied behind the vision of the C-HPP. Other large-scale proteomics projects not under the umbrella of an official project of the HUPO organization have been proposed, funded, and undertaken.9 Those projects are not incompatible with, but rather complementary to, the CHPP. Preliminary findings from those studies suggest that there are, in principle, no obstacles to validating all human proteins using current approaches and technologies. Recent efforts to establish deep coverage of the proteomes of multiple cell lines have demonstrated that proteomics approaches today have the same efficiency and coverage as transcriptome measurements by RNA-sequencing methods.10−13 Indeed, proteins for all genes in the human genome could be validated relatively quickly if a suitable supply of highly differentiated cells and a broad range of human tissues in which all of the proteins are expected to be expressed are investigated.10 With the integration of these and other rapidly emerging systems-wide data into existing proteomics resources, the initial phase of the C-HPP project could be completed relatively quickly. This will allow the initiative to then focus on the next phase of identifying the major alternative spliced variant and single nucleotide variant for each protein coding gene as well as associated changes in the 3 major post-translational modifications (phosphoryl-, acetyl-, and glycosyl-). Comprehensive and deep sequencing of all human cells and tissues is a necessary step for biology as a large-scale science. About 40% of human genes have not been validated experimentally. Among the benefits of such an effort would be the valuable insight into how many proteins are actually



HPP - BEYOND CATALOGING The HPP has been criticized by many as “another cataloging exercise” and stigmatized as a duplication effort of the Human Genome Project (HGP) that will not significantly contribute to the pressing need to find new therapeutic treatments of common diseases. In a response to a recent proposal by the US government to focus on omics as part of its blue-print for the bioeconomy,17 it was stated that all omics efforts should effectively be outsourced, suggesting that cataloging efforts are essentially not a worthwhile use of US tax dollars. Funding and scientific resources instead should be shifted toward research and technical approaches that drive the understanding of modular biology and systems biology.18 We think that the CHPP and the B/D-HPP have a much larger role in the progression of biological science. HUPO needs to continue to 29

dx.doi.org/10.1021/pr300933p | J. Proteome Res. 2013, 12, 28−32

Journal of Proteome Research

Perspective

4

develop and articulate ways in which the C-HPP and the B/DHPP will affect the larger biomedical research community and basic biology.19,20 We offer perspectives to the following four areas of the C-HPP and the B/D-HPP that we think are critical to its success.

A growing number of biologists today utilize the new technologies, workflows and resources generated by proteomics in the past decade. A convergence of traditional protein analysis techniques familiar to biologists with proteomics approaches is evident. The HPP should include those methods as part of a horizontal integrated layer across the three pillars of technology and create connections that integrate traditional with emerging protein detection tools.14 The orthogonal layer of techniques should include many traditional biochemical and molecular methods, such as protein functional assays with protein specific chemical probes, siRNA for post-transcriptional gene silencing, protein activity assays and techniques probing protein structural changes. A much broader set of tools integrated with proteomics techniques will be more attractive to the larger biology community and drive synergies to unlock the complete picture of the proteome and its complexity. The set of proteins expressed in a particular cell or cell type is known as its proteome and the goals of the HPP should not attempt to be all inclusive. However, the HPP has an integrative role and needs to take advantage of it. By creating a clear vision on what the HPP is going to deliver beyond a catalog, and by linking its goals to activities and methods in the biology community, the HPP will take the first step toward transforming biology from a descriptive into a quantitative, hypothesis-driven large-scale science.

1

Leaders of the HPP have not stated how the HGP and the HPP projects clearly differ and how the future phases of the HPP projects will accelerate our basic understanding of the modular architecture of organisms and contribute to the basic understanding of biology.21 A protein pathway and network map representing the interactome has been articulated as part of the C-HPP mission statement, but few details have been discussed besides the goal of creating an inventory of such circuitry.16 We agree that the mapping of proteins and their interactions with other proteins as part of cellular networks is the basis for a comprehensive picture of the cellular circuitry.19 However, an understanding of the dynamic nature of biological systems requires that proteomics provides temporal and spatial resolved maps of molecular processes. We need to make a movie of cellular events rather than taking random snapshots of cellular states and static pictures of network connectivity.22 Why not aim to create the “YouTube channel” for modular and systems biology to capture the dynamics of emerging network modules?



2

ROLE OF INDUSTRY History shows that industry has a significant role to play in the evolution and success of large-scale scientific endeavors. Unlike the HGP where industry became one of the major drivers of the project, industrial partners of the HPP have so far not taken a more active role in the shaping of the HPP. The HPP needs the support and participation of industry, and in turn we, the biological tools industry, want to be a partner of the HPP. The current HPP mission defines as one of its goals and outcomes the delivery of publicly available resources which includes protein profiling methods, protein specific reagents and cDNA clones of all human protein-coding genes. We feel that the academic sector is not particularly well positioned to deliver on those promises, but should rely on the expertise of its industrial partners to produce and distribute high-quality research tools, that include instrumentation, software and consumables. The role of industry also includes the specific responsibility and involvement in the integration of a set of well-defined complementary technology platforms that make proteomic techniques more accessible to biological research.24 If the ultimate goal is to make high-throughput protein detection so simple and inexpensive in the future that it can be routinely deployed as a universal tool of biology, the industrial partners represented by the HUPO IAB will have to help to develop and disseminate well-defined approaches, methods, and complementary technologies that make it possible.

In our opinion, a completely different set of scientific tools is necessary to enable systematic studies of functional subunits in human and model organisms. In that respect, the current HPP tool set is too narrowly defined and a much larger set of methods have to be part of the HPP strategy. For example, the knowledge database as part of the third pillar of the HPP can only be a first step and the concept has to extend beyond the reference portal for the human proteome. The informatics aspects of the HPP are largely defined as a massive data collection and warehousing effort today, and a much clearer and ambitious vision has to emerge that has to include the goal of the creation and validation of data sets for systems biology studies.23 Proteomics has the opportunity to establish biology as a quantitative science and informatics cannot be the afterthought of such an important project, but has to be front and center to the efforts. Otherwise, the project is in danger of being seriously underfunded in one of the most important scientific opportunities of the HPP, the quantitative study of disease mechanisms and discovery of associated targets for therapeutic intervention. 3

Proteomics so far has been evolving largely isolated from mainstream biology24 and inclusion of the broader biology community has emerged as the biggest challenge of the HPP. A recent bibliometric analysis of research activities in the past 20 years found that most protein research focuses on those proteins known before the human genome was mapped. A shift in research activity was often spurred by the emergence of tools to study a particular protein, not by a change in the protein’s perceived importance. The study concluded that high-quality readily available research tools are needed for all discovered proteins to drive research into the unstudied parts of the human genome.25 The IAB of HUPO thinks that the HPP is well positioned to address this problem and HUPO should make this its core challenge and mission moving forward.28



C-HPP - A CALL TO ACTION Large-scale science projects require equally large-scale processes, efficient networks of communication, and big-picture management to achieve division of labor and efficiencies of scale. Other scientific disciplines, such as particle physics and astronomy, are already comfortable with such processes. The field of biology is less accustomed to such large-scale science. 30

dx.doi.org/10.1021/pr300933p | J. Proteome Res. 2013, 12, 28−32

Journal of Proteome Research



ABBREVIATIONS HUPO, Human Proteome Organization; HPP, Human Proteome Project; C-HPP, Chromosome-centric Human Proteome Project; IAB, Industrial Advisory Board

The C-HPP will require a broad participation and cooperation across the scientific community, public agencies, industries, and countries. In all of this, HUPO must take a leadership role, because ultimately, creating the scientific, administrative, and technical processes to enable such large-scale science may prove as valuable as the data itself.26 The HPP began in the shadow of the financial crises of 2008. No one would deny that the macro-economic environment today is less advantageous than the easy-money economic bubble enjoyed by the HGP. At the same time, the C-HPP will benefit from the advances of the digital age, new ways of collaborating and sharing data, that previous large-scale science projects could not. Crowd sourcing through rapid and free data sharing is fast becoming accepted as a method to generate knowledge and will accelerate the HPP in ways that were impossible and unimaginable a decade ago. By tapping into the next generation of scientists with preferences for and familiarity with social media, economics of scales for the C-HPP can be achieved by true division of labor reducing undesired parallel efforts. Public and private funding agencies across the world should recognize this unique opportunity. They can support the project either directly through appropriate grants or indirectly, by requesting investigators of existing funding to use standards recommended by HUPO and to make their results available to the C-HPP. Funding agencies should also take the opportunity to establish novel and innovative protein detection technologies as part of the C-HPP participation. More importantly, funding agencies need to recognize that participating teams or countries will become part of a highly integrated network that strengthen the regional scientific basis, establish new collaborative networks, and further their institutional and national research agenda. As with most scientific efforts, there is no way to predict with certainty how greatly the C-HPP will impact the science of proteins or the field of biology. The same was true of the Human Genome Project. Today, no one would deny the impact of the HGP, the power of the genome maps, and the cheap sequencing technologies generated as result of the project.27 Generating a complete picture of the human genome not only created a framework for subsequent research, it enabled a new ways of thinking about hypothesis-driven and hypothesis-free science. It forged collaborations between disciplines that rarely interacted before. We should not deny the field of proteomics the opportunity to take the achievements of the HGP to the next level by fully supporting the goals of the C-HPP.



Perspective



REFERENCES

(1) HUPO- Mission statement of HUPO- the human proteome organization; www.hupo.org. (2) Editorial. The call of the human proteome. Nat. Methods 2010, 7 (9), 661. (3) HUPO. A gene-centric human proteome project: HUPO-the Human Proteome Organization. Mol. Cell. Proteomics 2010, 2, 427−9. (4) Omenn, G. S.; Aebersold, R.; Paik, Y. K. The human plasma proteome project. Proteomics 2009, 9, 4−6. (5) Hamacher, M.; Hardt, T.; van Hall, A.; Stephan, C.; Marcus, K.; et al. Inside SMP proteomics: six years German Human Brain Proteome Project (HBPP) - a summary. Proteomics 2008, 8 (6), 1118−28. (6) He, F. Human liver proteome project: plan, progress, and perspectives. Mol. Cell. Proteomics 2005, 4 (12), 1841−8. (7) Sun, A.; Jiang, Y.; Wang, X.; Liu, Q.; Zhong, F.; He, Q.; Guan, W.; Li, H.; Sun, Y.; Shi, L.; Yu, H.; Yang, D.; Xu, Y.; Song, Y.; Tong, W.; Li, D.; Lin, C.; Hao, Y.; Geng, C.; Yun, D.; Zhang, X.; Yuan, X.; Chen, P.; Zhu, Y.; Li, Y.; Liang, S.; Zhao, X.; Liu, S.; He, F. Liverbase: a comprehensive view of human liver biology. J. Proteome Res. 2010, 9 (1), 50−8. (8) Paik, Y. K.; Jeong, S. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Cho, S. Y.; Lee, H. J.; Na, K.; Choi, E. Y.; Yan, F.; Zhang, F.; Zhang, Y.; Snyder, M.; Cheng, Y.; Chen, R.; Marko-Varga, G.; Deutsch, E. W.; Kim, H.; Kwon, J. Y.; Aebersold, R.; Bairoch, A.; Taylor, A. D.; Kim, K. Y.; Lee, E. Y.; Hochstrasser, D.; Legrain, P.; Hancock, W. S. The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat. Biotechnol. 2012, 7, 30 (3), 221−3. (9) Lamond, A. I.; Uhlen, M.; Horning, S.; Makarov, A.; Robinson, C.V..; Serrano, L.; Hartl, F. U.; Baumeister, W.; Werenskiold, A. K.; Andersen, J. S.; Vorm, O.; Linial, M.; Aebersold, R.; Mann, M. Advancing cell biology through proteomics in space and time (PROSPECTS). Mol. Cell. Proteomics 2012, 11 (3), O112.017731. (10) Geiger, T.; Wehner, A.; Schaab, C.; Cox, J.; Mann, M. Comparative proteomic analysis of eleven common cell lines reveals ubiquitous but varying expression of most proteins. Mol. Cell. Proteomics 2012, 11 (3), M111.014050. (11) Lundberg, E.; Jagerberg, L.; Klevebring, D.; Matic, I.; Geiger, T.; Cox, J.; Algenas, C.; Lundeberg, J.; Mann, M.; Uhlen, M. Defining the transcriptome and proteome in three functionally different human cell lines. Mol. Syst. Biol. 2010, 6, 450. (12) Munoz, J.; Low, T. Y.; Kok, Y. J.; Chin, A.; Frese, C. K.; Ding, V.; Choo, A.; Heck, A. J. The quantitative proteomes of humaninduced pluripotent stem cells and embryonic stem cells. Mol. Syst. Biol. 2011, 7, 550. (13) Beck, M.; Schmidt, A.; Malmstroem, J.; Claassen, M.; Ori, A.; Szymborska, A.; Herzog, F.; Rinner, O.; Ellenberg, J.; Aebersold, R. The quantitative proteome of a human cell line. Mol. Syst. Biol. 2011, 7, 549. (14) Legrain, P.; Aebersold, R.; Archakov, A.; Bairoch, A.; Bala, K.; Beretta, L.; Bergeron, J.; Borchers, C. H.; Corthals, G. L.; Costello, C. E.; Deutsch, E. W.; Domon, B.; Hancock, W.; He, F.; Hochstrasser, D.; Marko-Varga, G.; Salekdeh, G. H.; Sechi, S.; Snyder, M.; Srivastava, S.; Uhlén, M.; Wu, C. H.; Yamamoto, T.; Paik, Y. K.; Omenn, G. S. The human proteome project: current state and future direction. Mol. Cell. Proteomics 2011, 10 (7), M111.009993. (15) Chakravarti, A. Genomics is not enough. Science 2011, 335, 15. (16) Paik, Y. K.; Omenn, G. S.; Uhlen, M.; Hanash, S.; Marko-Varga, G.; Aebersold, R.; Bairoch, A.; Yamamoto, T.; Legrain, P.; Lee, H. J.; Na, K.; Jeong, S. K.; He, F.; Binz, P. A.; Nishimura, T.; Keown, P.; Baker, M. S.; Yoo, J. S.; Garin, J.; Archakov, A.; Bergeron, J.; Salekdeh, G. H.; Hancock, W. S. Standard guidelines for the chromosome-

AUTHOR INFORMATION

Corresponding Author

*E-mail: andreas.huhmer@thermofisher.com; aranpaulus@ sbcglobal.net. Notes

The authors declare no competing financial interest.



ACKNOWLEDGMENTS We thank David C. Fisher at Thermo Fisher Scientific for the critical review of the manuscript. Funding was provided by the authors' respective companies. 31

dx.doi.org/10.1021/pr300933p | J. Proteome Res. 2013, 12, 28−32

Journal of Proteome Research

Perspective

centric human proteome project. J. Proteome Res. 2012, 11 (4), 2005− 13. (17) Office of Science and Technology Policy. National Bioeconomy Blueprint. Office of Science and Technology Policy, Executive Office of the President: Washington D.C., 2012; p 48. (18) Editorial. Big ideas and grand challenges. Nat. Biotechnol. 2011, 29 (11), 951. (19) Vidal, M.; Chan, D. W.; Gerstein, M.; Mann, M.; Omenn, G. S.; Tagle, D.; Sechi, S. Workshop Participants. The human proteome - a scientific opportunity for transforming diagnostics, therapeutics, and healthcare. Clin. Proteomics 2012, 9 (1), 6. (20) Hood, L. E.; Omenn, G. S.; Moritz, R. L.; Aebersold, R.; Yamamoto, K.; Amos, M.; Hunter-Cevera, J.; Locascio, L. Workshop Participants. New and improved proteomics technologies for understanding complex biological systems: Addressing a grand challenge in the life sciences. Proteomics 2012, 12 (18), 2773−83. (21) Hartwell, L. H.; Hopfield, J. J.; Leibler, S.; Murray, A. W. From molecular to modular cell biology. Nature 1999, 402, C47−52. (22) Goel, A.; Li, S. S.; Wilkins, M. R. Four-dimensional visualization and analysis of protein-protein interaction networks. Proteomics 2011, 11 (13), 2672−82. (23) Isalan, M. A cell in a computer. Nature 2012, 488, 40−1. (24) Editorial. Mind the technology gap. Nat. Method 2012, 9 (4), 311. (25) Edwards, A. M. Too many roads not taken. Nature 2011, 470, 163. (26) Collins, F. S.; Morgan, M.; Patrinos, A. The Human Genome Project: lessons from large-scale biology. Science 2003, 300 (5617), 286−90. (27) Lander, E. S. Initial impact of the sequencing of the human genome. Nature 2011, 470 (7333), 187−97. (28) Menon, R.; Roy, A.; Mukherjee, S.; Belkin, S.; Zhang, Y.; Omenn, G. S. Functional implications of structural predictions for alternative splice proteins expressed in Her2/neu-induced breast cancers. J. Proteome Res. 2011, 10 (12), 5503−11.

32

dx.doi.org/10.1021/pr300933p | J. Proteome Res. 2013, 12, 28−32