Proteomic Studies Related to Genetic Determinants of Variability in

Nov 17, 2013 - Netherlands Bioinformatics Centre, Geert Grooteplein 28, 6525 GA ... Netherlands Proteomics Centre, Padualaan 8, 3584 CH Utrecht, The ...
0 downloads 0 Views 3MB Size
Perspective pubs.acs.org/jpr

Proteomic Studies Related to Genetic Determinants of Variability in Protein Concentrations Péter Horvatovich,*,†,‡,§ Lude Franke,∥ and Rainer Bischoff†,‡,§ †

Analytical Biochemistry, Department of Pharmacy, University of Groningen, A. Deusinglaan 1, 9713 AV Groningen, The Netherlands ‡ Netherlands Bioinformatics Centre, Geert Grooteplein 28, 6525 GA Nijmegen, The Netherlands § Netherlands Proteomics Centre, Padualaan 8, 3584 CH Utrecht, The Netherlands ∥ Department of Genetics, University Medical Center Groningen, University of Groningen, Hanzeplein 1, 9713 GZ Groningen, The Netherlands ABSTRACT: Genetic variation has multiple effects on the proteome. It may influence the expression level of proteins, modify their sequences through single nucleotide polymorphisms, the occurrence of allelic variants, or alternative splicing (ASP) events. This perspective paper summarizes the major effects of genetic variability on protein expression and isoforms and provides an overview of proteomics techniques and methods that allow studying the effects of genetic variability at different levels of the proteome. The paper provides an overview of recent quantitative trait loci studies performed to explore the effect of genetic variation on protein expression (pQTL). Finally it gives a perspective view on advances in proteomics technology and the role of the Chromosome-Centric Human Proteome Project (C-HPP) by creating large-scale resources that may facilitate performing more comprehensive pQTL experiments in the future. KEYWORDS: quantitative trait loci, genetic variation, proteomics, transcriptomics, mass spectrometry, single nucleotide polymorphism, genome-wide association studies



INTRODUCTION Current genomics technologies enable studying the genomewide effect of genetic variability on the transciptome by using expression quantitative trait locus (eQTL)1−5 mapping and associate single nucleotide polymorphisms (SNPs) with disease using genome-wide association studies (GWAS). Combining these two techniques allows for identifying genes that are involved in particular phenotypes and, more importantly, that may be in a causal relationship with disease (Figure 1) (Mendelian randomization and causal inference analysis can address this question in exact manner but require very large sample sizes of at least 20 000 samples).6 These genomic techniques are nowadays creating large numbers of hypotheses on the role of SNPs and genes in disease. However, the validity of these hypotheses must be tested at the protein level to get a comprehensive picture of the molecular mechanisms of biological events in relation to disease onset and progression. Genetic variation can have two types of effects at the proteome level: (1) it may cause different isoforms for a protein encoded in a gene, which require identification of protein isoform sequences, or (2) it may influence the concentration and activity of the translated protein, which can be assessed by protein quantitative trait loci (pQTL) mapping. © 2013 American Chemical Society

One should realize that SNPs do not necessarily solely affect total gene expression levels: they can also exert their effect by causing alternative splicing (ASP),1−3 they can be nonsynonymous (nsSNP) resulting in a different protein sequence, they can cause alternative polyadenylation of the 3′ UTR,7 or they can result in altered RNA decay.8 SNPs in noncoding regions may influence transcription and translation regulation and RNA editing mechanisms via ADAR and APOBEC proteins9,10 and are sources of transcriptome and protein diversity. Proteins are regulated by genomic and environmental factors via more complex mechanisms than transcripts alone. However, the main structure of protein interaction networks and pathways are reflected at both levels.11 This common framework can be used to integrate transciptomics and proteomics data and to validate the results of eQTL and GWAS studies at the proteome level. Post-translational modification (PTM) of proteins by enzymatic or nonenzymatic chemical reactions add another dimension of regulation creating an enormous Special Issue: Chromosome-centric Human Proteome Project Received: July 23, 2013 Published: November 17, 2013 5

dx.doi.org/10.1021/pr400765y | J. Proteome Res. 2014, 13, 5−14

Journal of Proteome Research

Perspective

Figure 1. Summary of the most widely used analysis techniques in genomics studies. Genome-wide association studies (GWAS) identify SNPs (upper panel) that are associated with disease without studying which genes are involved with the association. Association is summarized in a Manhattan plot, which shows for each SNP (x axis) the strength of association as the negative logarithm of the probability of, e.g., the odds ratio based on a chi-square test (y axis). The threshold is calculated using multiple testing corrections. The Manhattan plot is taken from ref 78. Expression quantitative trait loci (eQTL) identify SNPs, which influence gene expression levels (lower panel). One SNP may influence the expression level of one or multiple genes, which can be shown with box plots (lower right plot taken from ref 35). In cis eQTL the associated SNP and gene transcript (trait) are close (80 000 subjects identifies multiple loci for C-reactive protein levels. Circulation 2011, 123 (7), 731−8. (52) Garge, N.; Pan, H.; Rowland, M. D.; Cargile, B. J.; Zhang, X.; Cooley, P. C.; Page, G. P.; Bunger, M. K. Identification of quantitative trait loci underlying proteome variation in human lymphoblastoid cells. Mol. Cell. Proteomics 2010, 9 (7), 1383−99. (53) Melzer, D.; Perry, J. R.; Hernandez, D.; Corsi, A. M.; Stevens, K.; Rafferty, I.; Lauretani, F.; Murray, A.; Gibbs, J. R.; Paolisso, G.; Rafiq, S.; Simon-Sanchez, J.; Lango, H.; Scholz, S.; Weedon, M. N.; Arepalli, S.; Rice, N.; Washecka, N.; Hurst, A.; Britton, A.; Henley, W.; van de Leemput, J.; Li, R.; Newman, A. B.; Tranah, G.; Harris, T.; Panicker, V.; Dayan, C.; Bennett, A.; McCarthy, M. I.; Ruokonen, A.; Jarvelin, M. R.; Guralnik, J.; Bandinelli, S.; Frayling, T. M.; Singleton, A.; Ferrucci, L. A genome-wide association study identifies protein quantitative trait loci (pQTLs). PLoS Genet. 2008, 4 (5), e1000072.

(32) Keren, H.; Lev-Maor, G.; Ast, G. Alternative splicing and evolution: diversification, exon definition and function. Nat. Rev. Genet. 2010, 11 (5), 345−55. (33) Bonnelykke, K.; Matheson, M. C.; Pers, T. H.; Granell, R.; Strachan, D. P.; Alves, A. C.; Linneberg, A.; Curtin, J. A.; Warrington, N. M.; Standl, M.; Kerkhof, M.; Jonsdottir, I.; Bukvic, B. K.; Kaakinen, M.; Sleimann, P.; Thorleifsson, G.; Thorsteinsdottir, U.; Schramm, K.; Baltic, S.; Kreiner-Moller, E.; Simpson, A.; Pourcain, B. S.; Coin, L.; Hui, J.; Walters, E. H.; Tiesler, C. M.; Duffy, D. L.; Jones, G.; Ring, S. M.; McArdle, W. L.; Price, L.; Robertson, C. F.; Pekkanen, J.; Tang, C. S.; Thiering, E.; Montgomery, G. W.; Hartikainen, A. L.; Dharmage, S. C.; Husemoen, L. L.; Herder, C.; Kemp, J. P.; Elliot, P.; James, A.; Waldenberger, M.; Abramson, M. J.; Fairfax, B. P.; Knight, J. C.; Gupta, R.; Thompson, P. J.; Holt, P.; Sly, P.; Hirschhorn, J. N.; Blekic, M.; Weidinger, S.; Hakonarsson, H.; Stefansson, K.; Heinrich, J.; Postma, D. S.; Custovic, A.; Pennell, C. E.; Jarvelin, M. R.; Koppelman, G. H.; Timpson, N.; Ferreira, M. A.; Bisgaard, H.; Henderson, A. J. Meta-analysis of genome-wide association studies identifies ten loci influencing allergic sensitization. Nat. Genet. 2013, 45 (8), 902−6. (34) Himes, B. E.; Sheppard, K.; Berndt, A.; Leme, A. S.; Myers, R. A.; Gignoux, C. R.; Levin, A. M.; Gauderman, W. J.; Yang, J. J.; Mathias, R. A.; Romieu, I.; Torgerson, D. G.; Roth, L. A.; Huntsman, S.; Eng, C.; Klanderman, B.; Ziniti, J.; Senter-Sylvia, J.; Szefler, S. J.; Lemanske, R. F., Jr.; Zeiger, R. S.; Strunk, R. C.; Martinez, F. D.; Boushey, H.; Chinchilli, V. M.; Israel, E.; Mauger, D.; Koppelman, G. H.; Postma, D. S.; Nieuwenhuis, M. A.; Vonk, J. M.; Lima, J. J.; Irvin, C. G.; Peters, S. P.; Kubo, M.; Tamari, M.; Nakamura, Y.; Litonjua, A. A.; Tantisira, K. G.; Raby, B. A.; Bleecker, E. R.; Meyers, D. A.; London, S. J.; Barnes, K. C.; Gilliland, F. D.; Williams, L. K.; Burchard, E. G.; Nicolae, D. L.; Ober, C.; DeMeo, D. L.; Silverman, E. K.; Paigen, B.; Churchill, G.; Shapiro, S. D.; Weiss, S. T. Integration of mouse and human genome-wide association data identifies KCNIP4 as an asthma gene. PLoS One 2013, 8 (2), e56179. (35) Hao, K.; Bosse, Y.; Nickle, D. C.; Pare, P. D.; Postma, D. S.; Laviolette, M.; Sandford, A.; Hackett, T. L.; Daley, D.; Hogg, J. C.; Elliott, W. M.; Couture, C.; Lamontagne, M.; Brandsma, C. A.; van den Berge, M.; Koppelman, G.; Reicin, A. S.; Nicholson, D. W.; Malkov, V.; Derry, J. M.; Suver, C.; Tsou, J. A.; Kulkarni, A.; Zhang, C.; Vessey, R.; Opiteck, G. J.; Curtis, S. P.; Timens, W.; Sin, D. D. Lung eQTLs to help reveal the molecular underpinnings of asthma. PLoS Genet. 2012, 8 (11), e1003029. (36) Franke, L.; Jansen, R. C. eQTL analysis in humans. Methods Mol. Biol. 2009, 573, 311−28. (37) Majewski, J.; Pastinen, T. The study of eQTL variations by RNA-seq: from SNPs to phenotypes. Trends Genet. 2011, 27 (2), 72−9. (38) Cookson, W.; Liang, L.; Abecasis, G.; Moffatt, M.; Lathrop, M. Mapping complex disease traits with global gene expression. Nat. Rev. Genet. 2009, 10 (3), 184−94. (39) Diz, A. P.; Dudley, E.; MacDonald, B. W.; Pina, B.; Kenchington, E. L.; Zouros, E.; Skibinski, D. O. Genetic variation underlying protein expression in eggs of the marine mussel Mytilus edulis. Mol. Cell. Proteomics 2009, 8 (1), 132−44. (40) Homuth, G.; Teumer, A.; Volker, U.; Nauck, M. A description of large-scale metabolomics studies: increasing value by combining metabolomics with genome-wide SNP genotyping and transcriptional profiling. J. Endocrinol. 2012, 215 (1), 17−28. (41) Illig, T.; Gieger, C.; Zhai, G.; Romisch-Margl, W.; Wang-Sattler, R.; Prehn, C.; Altmaier, E.; Kastenmuller, G.; Kato, B. S.; Mewes, H. W.; Meitinger, T.; de Angelis, M. H.; Kronenberg, F.; Soranzo, N.; Wichmann, H. E.; Spector, T. D.; Adamski, J.; Suhre, K. A genomewide perspective of genetic variation in human metabolism. Nat. Genet. 2010, 42 (2), 137−41. (42) Suhre, K.; Wallaschofski, H.; Raffler, J.; Friedrich, N.; Haring, R.; Michael, K.; Wasner, C.; Krebs, A.; Kronenberg, F.; Chang, D.; Meisinger, C.; Wichmann, H. E.; Hoffmann, W.; Volzke, H.; Volker, U.; Teumer, A.; Biffar, R.; Kocher, T.; Felix, S. B.; Illig, T.; Kroemer, H. K.; Gieger, C.; Romisch-Margl, W.; Nauck, M. A genome-wide 13

dx.doi.org/10.1021/pr400765y | J. Proteome Res. 2014, 13, 5−14

Journal of Proteome Research

Perspective

(54) Michalski, A.; Cox, J.; Mann, M. More than 100,000 detectable peptide species elute in single shotgun proteomics runs but the majority is inaccessible to data-dependent LC−MS/MS. J. Proteome Res. 2011, 10 (4), 1785−93. (55) Johansson, A.; Enroth, S.; Palmblad, M.; Deelder, A. M.; Bergquist, J.; Gyllensten, U. Identification of genetic variants influencing the human plasma proteome. Proc. Natl. Acad. Sci. U. S. A. 2013, 110 (12), 4673−8. (56) Palmblad, M.; van der Burgt, Y. E.; Mostovenko, E.; Dalebout, H.; Deelder, A. M. A novel mass spectrometry cluster for highthroughput quantitative proteomics. J. Am. Soc. Mass Spectrom. 2010, 21 (6), 1002−11. (57) Surinova, S.; Schiess, R.; Huttenhain, R.; Cerciello, F.; Wollscheid, B.; Aebersold, R. On the development of plasma protein biomarkers. J. Proteome Res. 2011, 10 (1), 5−16. (58) Anderson, N. L.; Anderson, N. G. The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell. Proteomics 2002, 1 (11), 845−67. (59) Foss, E. J.; Radulovic, D.; Shaffer, S. A.; Ruderfer, D. M.; Bedalov, A.; Goodlett, D. R.; Kruglyak, L. Genetic basis of proteome variation in yeast. Nat. Genet. 2007, 39 (11), 1369−75. (60) Butter, F.; Davison, L.; Viturawong, T.; Scheibe, M.; Vermeulen, M.; Todd, J. A.; Mann, M. Proteome-wide analysis of diseaseassociated SNPs that show allele-specific transcription factor binding. PLoS Genet. 2012, 8 (9), e1002982. (61) Tress, M. L.; Bodenmiller, B.; Aebersold, R.; Valencia, A. Proteomics studies confirm the presence of alternative protein isoforms on a large scale. Genome Biol. 2008, 9 (11), R162. (62) Blakeley, P.; Siepen, J. A.; Lawless, C.; Hubbard, S. J. Investigating protein isoforms via proteomics: a feasibility study. Proteomics 2010, 10 (6), 1127−40. (63) Desiere, F.; Deutsch, E. W.; King, N. L.; Nesvizhskii, A. I.; Mallick, P.; Eng, J.; Chen, S.; Eddes, J.; Loevenich, S. N.; Aebersold, R. The PeptideAtlas project. Nucleic Acids Res. 2006, 34 (Database issue), D655−8. (64) Deutsch, E. W. The PeptideAtlas Project. Methods Mol. Biol. 2010, 604, 285−96. (65) Eng, J.; McCormack, A.; Yates, J. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994, 5 (11), 976−89. (66) Nagaraj, N.; Wisniewski, J. R.; Geiger, T.; Cox, J.; Kircher, M.; Kelso, J.; Paabo, S.; Mann, M. Deep proteome and transcriptome mapping of a human cancer cell line. Mol. Syst. Biol. 2011, 7, 548. (67) Cote, R. G.; Griss, J.; Dianes, J. A.; Wang, R.; Wright, J. C.; van den Toorn, H. W.; van Breukelen, B.; Heck, A. J.; Hulstaert, N.; Martens, L.; Reisinger, F.; Csordas, A.; Ovelleiro, D.; Perez-Rivevol, Y.; Barsnes, H.; Hermjakob, H.; Vizcaino, J. A. The PRoteomics IDEntification (PRIDE) Converter 2 framework: an improved suite of tools to facilitate data submission to the PRIDE database and the ProteomeXchange consortium. Mol. Cell. Proteomics 2012, 11 (12), 1682−9. (68) Fagerberg, L.; Oksvold, P.; Skogs, M.; Algenas, C.; Lundberg, E.; Ponten, F.; Sivertsson, A.; Odeberg, J.; Klevebring, D.; Kampf, C.; Asplund, A.; Sjostedt, E.; Al-Khalili Szigyarto, C.; Edqvist, P. H.; Olsson, I.; Rydberg, U.; Hudson, P.; Ottosson Takanen, J.; Berling, H.; Bjorling, L.; Tegel, H.; Rockberg, J.; Nilsson, P.; Navani, S.; Jirstrom, K.; Mulder, J.; Schwenk, J. M.; Zwahlen, M.; Hober, S.; Forsberg, M.; von Feilitzen, K.; Uhlen, M. Contribution of antibody-based protein profiling to the human Chromosome-centric Proteome Project (CHPP). J. Proteome Res. 2013, 12 (6), 2439−48. (69) Uhlen, M.; Oksvold, P.; Fagerberg, L.; Lundberg, E.; Jonasson, K.; Forsberg, M.; Zwahlen, M.; Kampf, C.; Wester, K.; Hober, S.; Wernerus, H.; Bjorling, L.; Ponten, F. Towards a knowledge-based Human Protein Atlas. Nat. Biotechnol. 2010, 28 (12), 1248−50. (70) Meyer, B.; Papasotiriou, D. G.; Karas, M. 100% protein sequence coverage: a modern form of surrealism in proteomics. Amino Acids 2011, 41 (2), 291−310. (71) Frese, C. K.; Altelaar, A. F.; van den Toorn, H.; Nolting, D.; Griep-Raming, J.; Heck, A. J.; Mohammed, S. Toward full peptide

sequence coverage by dual fragmentation combining electron-transfer and higher-energy collision dissociation tandem mass spectrometry. Anal. Chem. 2012, 84 (22), 9668−73. (72) Guthals, A.; Watrous, J. D.; Dorrestein, P. C.; Bandeira, N. The spectral networks paradigm in high throughput mass spectrometry. Mol. BioSyst. 2012, 8 (10), 2535−44. (73) Lam, H. Spectral archives: a vision for future proteomics data repositories. Nat. Methods 2011, 8 (7), 546−8. (74) Frank, A. M.; Monroe, M. E.; Shah, A. R.; Carver, J. J.; Bandeira, N.; Moore, R. J.; Anderson, G. A.; Smith, R. D.; Pevzner, P. A. Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra. Nat. Methods 2011, 8 (7), 587−91. (75) Bandeira, N. Protein identification by spectral networks analysis. Methods Mol. Biol. 2011, 694, 151−68. (76) Bandeira, N.; Tsur, D.; Frank, A.; Pevzner, P. A. Protein identification by spectral networks analysis. Proc. Natl. Acad. Sci. U. S. A. 2007, 104 (15), 6140−5. (77) Di Palma, S.; Zoumaro-Djayoon, A.; Peng, M.; Post, H.; Preisinger, C.; Munoz, J.; Heck, A. J. Finding the same needles in the haystack? A comparison of phosphotyrosine peptides enriched by immuno-affinity precipitation and metal-based affinity chromatography. J. Proteomics 2013, 91, 331−7. (78) Moffatt, M. F.; Gut, I. G.; Demenais, F.; Strachan, D. P.; Bouzigon, E.; Heath, S.; von Mutius, E.; Farrall, M.; Lathrop, M.; Cookson, W. O. A large-scale, consortium-based genomewide association study of asthma. N. Engl. J. Med. 2010, 363 (13), 1211−21. (79) Schroder, A.; Klein, K.; Winter, S.; Schwab, M.; Bonin, M.; Zell, A.; Zanger, U. M. Genomics of ADME gene expression: mapping expression quantitative trait loci relevant for absorption, distribution, metabolism and excretion of drugs in human liver. Pharmacogenomics J. 2013, 13 (1), 12−20. (80) Gaudet, P.; Argoud-Puy, G.; Cusin, I.; Duek, P.; Evalet, O.; Gateau, A.; Gleizes, A.; Pereira, M.; Zahn-Zabal, M.; Zwahlen, C.; Bairoch, A.; Lane, L. neXtProt: organizing protein knowledge in the context of human proteome projects. J. Proteome Res. 2013, 12 (1), 293−8. (81) Craig, R.; Cortens, J. C.; Fenyo, D.; Beavis, R. C. Using annotated peptide mass spectrum libraries for protein identification. J. Proteome Res. 2006, 5 (8), 1843−9. (82) Goode, R. J.; Yu, S.; Kannan, A.; Christiansen, J. H.; Beitz, A.; Hancock, W. S.; Nice, E.; Smith, A. I. The proteome browser web portal. J. Proteome Res. 2013, 12 (1), 172−8. (83) Jeong, S. K.; Lee, H. J.; Na, K.; Cho, J. Y.; Lee, M. J.; Kwon, J. Y.; Kim, H.; Park, Y. M.; Yoo, J. S.; Hancock, W. S.; Paik, Y. K. GenomewidePDB, a proteomic database exploring the comprehensive protein parts list and transcriptome landscape in human chromosomes. J. Proteome Res. 2013, 12 (1), 106−11. (84) Guo, F.; Wang, D.; Liu, Z.; Lu, L.; Zhang, W.; Sun, H.; Zhang, H.; Ma, J.; Wu, S.; Li, N.; Jiang, Y.; Zhu, W.; Qin, J.; Xu, P.; Li, D.; He, F. CAPER: a chromosome-assembled human proteome browser. J. Proteome Res. 2013, 12 (1), 179−86. (85) Vizcaino, J. A.; Cote, R. G.; Csordas, A.; Dianes, J. A.; Fabregat, A.; Foster, J. M.; Griss, J.; Alpi, E.; Birim, M.; Contell, J.; O’Kelly, G.; Schoenegger, A.; Ovelleiro, D.; Perez-Riverol, Y.; Reisinger, F.; Rios, D.; Wang, R.; Hermjakob, H. The PRoteomics IDEntifications (PRIDE) database and associated tools: status in 2013. Nucleic Acids Res. 2013, 41 (Database issue), D1063−9.

14

dx.doi.org/10.1021/pr400765y | J. Proteome Res. 2014, 13, 5−14