Large-Scale Characterization and Analysis of the Murine Cardiac

Feb 6, 2009 - Department of Physiology, University of Toronto, Heart and Stroke/Richard Lewar Centre of Cardiovascular Excellence, Department of Medic...
0 downloads 8 Views 759KB Size
Large-Scale Characterization and Analysis of the Murine Cardiac Proteome Nicolas Bousette,†,‡ Thomas Kislinger,§,| Vincent Fong,⊥,# Ruth Isserlin,⊥,# Johannes A. Hewel,⊥,# Andrew Emili,*,⊥,# and Anthony O. Gramolini*,†,‡ Department of Physiology, University of Toronto, Heart and Stroke/Richard Lewar Centre of Cardiovascular Excellence, Department of Medical Biophysics, University of Toronto, and University Health Network, Toronto, Ontario, Canada, Banting and Best Department of Medical Research, and Donnelly Centre for Cellular and Biomolecular Research, Toronto, Ontario, M5G 1L6, Canada Received October 8, 2008

Recent advances in mass spectrometry and bioinformatics have provided the means to characterize complex protein landscapes from a wide variety of organisms and cell types. Development of standard proteomes exhibiting all of the proteins involved in normal physiology will facilitate the delineation of disease mechanisms. Here, we examine the wild-type cardiac proteome using data obtained from a subcellular fractionation protocol in combination with a multidimensional protein identification proteomics approach. We identified 4906 proteins which were allocated to either cytosolic, microsomal, mitochondrial matrix or mitochondrial membrane fractions with relative abundance values in each fraction. We subjected these proteins to hierarchical clustering, gene ontology terms analysis, immunoblotting, comparison to publicly available protein databases, comparison to 4 distinct cardiac transcriptomes, and finally, to 6 other related proteomic data sets. This study provides an exhaustive analysis of the cardiac proteome and is the first large-scale investigation of the subcellular location for over 2000 unannotated proteins. With the use of a subtractive transcriptomics approach, we have also extended our analysis to identify ‘cardiac selective’ factors in our proteome. Finally, using specific filtering criteria, we identified proteotypic peptides for subsequent use in targeted studies of both mouse and human. Therefore, we offer this as a major contribution to the advancement of the field of proteomics in cardiovascular research. Keywords: proteomics • organelle • informatics • MuDPIT • subcellular distribution

Introduction In recent decades, there have been major advances in the treatment of cardiovascular diseases. These include the introduction of diuretics, β-blockers, and ACE inhibitors.1-3 Despite these advances, heart disease is still the leader in morbidity and mortality in North America today.4-6 This suggests that, although these therapeutic modalities do affect the outcome of cardiovascular pathophysiology, they are not sufficient to completely prevent the progression of the disease. Cardiomyopathies of various etiologies are complex multifactorial diseases. Therefore, future therapies will require the analysis of several disease markers and modulators in a specific * To whom correspondence should be addressed. Dr. Anthony Gramolini, Department of Physiology, 112 College St. Rm. 307, University of Toronto Toronto, ON, M5G 1L6. Tel, (416) 978-5609; fax, (416) 978-8528; e-mail, [email protected]. Dr. Andrew Emili, Donnelly Centre for Cellular and Biomedical Research, University of Toronto, 160 College St. Room 914, Toronto, ON, Canada M5S 3E1. Tel, (416) 946-7281; fax, (416) 978-7437; e-mail, [email protected]. † Department of Physiology, University of Toronto. ‡ Heart and Stroke/Richard Lewar Centre of Cardiovascular Excellence. § Department of Medical Biophysics, University of Toronto. | University Health Network. ⊥ Banting and Best Department of Medical Research. # Donnelly Centre for Cellular and Biomolecular Research. 10.1021/pr800845a CCC: $40.75

 2009 American Chemical Society

disease context. With the capability of advanced mass spectrometry and bioinformatics to identify and quantify literally thousands of proteins from complex mixtures, proteomics can provide insight into unknown protein factors with specific expression patterns. The resultant disease fingerprint may unravel complex challenges we presently face regarding the pursuit of novel disease targets and specificity of treatments. Crucial to the delineation of disease mechanisms is the prior understanding of normal physiology. Therefore, there is a necessity to identify, categorize, and characterize all proteins responsible for propagating and maintaining normal physiological processes of human biology. Indeed, several groups have characterized various organ, cellular, and organellar proteomes, including proteomes of human plasma,7 mouse serum,8 major mouse organs,9-11 embryonic stem cells,12,13 cytosol,14 microsomes,15 mitochondria,16,17 nucleus,18 and nucleolus.19 Thus, the field of proteomics is growing geometrically as new studies can be validated and strengthened by previously characterized proteomes. We recently performed a comparative proteomic profile of a phospholamban mutant mouse model of dilated cardiomyopathy.20 In that study, we compared the cardiac proteomic profile of the mutant mouse to their wild-type (WT) littermates Journal of Proteome Research 2009, 8, 1887–1901 1887 Published on Web 02/06/2009

research articles in order to identify disease causing pathways and disease related biomarkers. The latter analysis was restricted to proteins with statistically significant deviations in expression profiles between diseased and wild-type mice, and there was no detailed characterization of the wild-type proteome. Therefore, our motivation for the current study was to take advantage of this large-scale data set and provide a broad and in-depth analysis focused solely on the wild-type cardiac protein landscape. Here, we have characterized the most comprehensive wild-type murine cardiac proteome to date. A stringent fractionation protocol was utilized here to further define these cardiac proteins with respect to subcellular location. This proteome will enable the scientific community to have a broad and informative report of the wild-type cardiac protein expression profile including semiquantitative relative protein abundance and subcellular location data.

Experimental Procedures Animals and Protein Sample Preparation. Recently, we performed a comparative proteomic profile of a phospholamban mutant mouse model of dilated cardiomyopathy.20 We extracted all spectra derived from WT heart tissue from the latter study to carry out the current bioinformatic analysis. Therefore, all data acquisition prior to spectral searching is as described therein. Briefly, we utilized 8, 16, and 24 week old male and female FVBN mice. Following CO2 asphyxiation, hearts from 6 mice were dissected and rinsed with ice-cold PBS. Ventricles were isolated and pooled. Subcellular fractionation of ventricular homogenates was carried out as previously described.21 Briefly, pooled ventricular tissue was homogenized in ice-cold lysis buffer (0.25 M Sucrose/50 mM Tris with 1 mM MgCl2 (pH 7.4), 1 mM PMSF, 1 mM DTT, and Proteinase inhibitor cocktail) with a glass douncer. The lysate was then centrifuged at 800g for 15 min. The resulting pellet containing the contractile apparatus and nuclei was discarded, while the supernatant was subjected to another centrifugation at 8000g to pellet the mitochondrial fraction. The supernatant was then centrifuged at 100 000g for 60 min to pellet the microsomal fraction, while the resulting supernatant contained soluble cytosolic proteins. The mitochondrial pellet was resuspended and sonicated lightly in 10 mM Hepes to disrupt mitochondria and the resulting suspension was recentrifuged at 14 000g to pellet the mitochondrial membrane fraction while the supernatant contained mitochondrial matrix proteins. Membrane containing fractions including the mitochondrial and microsomal pellets were solubilized by the addition of 1.5% TritonX100. Protein aliquots (100 µg) were precipitated, reduced, alkylated, and digested sequentially with endoproteinase Lys-C and trypsin. Proteomic Analysis. Comprehensive gel-free shotgun sequencing of subcellular protein fractions was performed as previously described.21 Subcellular fractions were run separately (see table in methods supplement for run distribution across fractions and time points in Supporting Information). Briefly, peptide mixtures were solid-phase extracted, acidified with formic acid, and loaded manually onto biphasic 100 µm inner diameter microcapillary fused silica columns packed sequentially with strong cation exchange beads (Partisphere; Whatman, Clifton, NJ) and reverse phase resin (Zorbax Eclipse XDB-C18; Agilent Technologies, Mississauga, ON). The columns were placed in-line with a quaternary HPLC pump interfaced using electrospray ionization to an LTQ linear ion trap mass spectrometer (Thermo Finnigan, San Jose, CA). The bound 1888

Journal of Proteome Research • Vol. 8, No. 4, 2009

Bousette et al. peptides were eluted using a 12 step × 100 min salt/H2O/ acetonitrile gradient. Precursor ions [400-2000 m/z] were subjected to data-dependent, collision-induced dissociation with dynamic exclusion enabled. Database Searches. A total of 3 965 917 MS/MS spectra derived from wild-type (WT) mouse samples (thus excluding any spectra that were derived from the disease model) were then researched against an updated (March, 2008) nonredundant mouse protein sequence database populated with 69 614 mouse proteins obtained from EBI Integr8 (UniProt Knowledgebase Release 13.4). This search was carried out using the SEQUEST search algorithm [SEQUEST-PVM v.27 (rev. 9) (1993)] to match peptide tandem mass spectra to peptide sequences. The embedded ExtractMS script with default parameter settings was used to automatically generate peak lists. Precursor mass tolerance was set to 3 Da, with daughter mass ion tolerance set to the default of 0, thus, enabling fully tryptic enzyme status, single site missed cleavages, and a static chemical modification of +57 amu on cysteine for carboxyamidomethylation. To determine the empirical False-Discovery Rate (FDR) all of the spectra, we researched against protein sequences in both the normal (Forward) and inverted (Reverse) amino acid orientations. The STATQUEST filtering algorithm was then applied to all putative search results to obtain a measure of the statistical reliability (confidence score) for each candidate identification (cutoff p-value 0.01, corresponding to a 99% or greater likelihood of being a correct match; see Supplemental Table S1).9,22 Inourpreviouspublication,20 cardiactissueswerefractionated,20,22 and the data was pooled across fractions. However, here we analyzed separately all spectra from all subcellular fractions (cytosol, microsomes, mitochondrial matrix and mitochondrial membranes). Only proteins identified with at least 2 spectra across all technical runs were included.

Results Protein Distribution. All the spectra from wild-type hearts were extracted from Gramolini et al.,20 and were researched against an up-to-date database created in March 2008. These spectra were all associated with subcellular compartments since a differential centrifugation protocol was used to isolate cytosolic, microsomal, and mitochondrial fractions.20 Mitochondrial proteins were further fractionated into mitochondrial matrix and mitochondrial membrane proteins. For this study, we recovered 3 965 917 spectra that were matched to 367 079 high confidence (99%) peptides which in turn mapped to 8832 proteins. We only selected proteins identified by 2 or more, high confidence peptides (across all technical runs), resulting in a proteome consisting of 4906 nonredundant proteins (Figure 1A and Supplemental Table S1). Proteins were binned to the subcellular fraction in which they demonstrated the highest number of spectral counts, and thus were characterized as being enriched in that fraction. As a result, 1234 proteins were allocated to the cytosol, 939 to the microsomal fraction, 647 to the mitochondrial matrix, and finally, 846 to the mitochondrial membrane fraction (Figure 1B). A total of 1240 proteins had equal number of spectral counts in more than 1 fraction. For example, a protein allocated to the ‘mixed’ subset may have two spectra in the cytosolic and two in the mitochondrial fractions. Because proteins could only be specifically allocated to one particular fraction if they had more spectra in one fraction than in any other, the latter protein would be allocated to the ‘mixed’ fraction. Therefore, 24%, 29%, 23%, and 24% of the spectra associated with proteins

Murine Cardiac Proteome

Figure 1. Data acquisition and protein allocation. (A) Flowchart demonstrating the data acquisition and filtering process. (B) Pie graphs demonstrating the specific and mixed distribution of proteins to the four subcellular fractions.

marked as ‘mixed’ were found in the cytosolic fraction, the microsomal fraction, and the mitochondrial matrix, and membrane fractions, respectively. Importantly, the 1240 proteins that were ‘mixed’ represented only ∼1% of the total spectral counts and, therefore, likely represent low-abundance proteins. To assess the degree of selection bias for protein size, we determined the molecular weight (MW) distribution for all identified proteins in the current proteome. Interestingly, comparison of the MW distribution pattern for proteins in this study to the MW distribution pattern of all mouse proteins listed in the Swiss-Prot database demonstrated a high degree of similarity. Indeed, correlation analysis showed that the two distributions patterns had a strong correlation (r ) 0.963) (Figure S1A). However, there does appear to be a slight bias for larger proteins (>200 kDa) in our proteome. To demonstrate the degree of protein enrichment in each fraction, we performed a statistical analysis using ANOVA of the spectral count distribution of proteins in each fraction (Figure S1B). Proteins allocated to a particular fraction were significantly greater in that fraction, demonstrating that, overall, proteins were not just marginally enriched, but that they showed a substantial and significant increase in spectral counts in the fraction in which they were found to be most abundant. For example, proteins allocated to the cytosol were ∼2-fold, ∼8-fold, and ∼30-fold more enriched in the cytosolic fraction than they were in the microsomal, mitochondrial matrix, and

research articles mitochondrial membrane fractions, respectively (Figure S1B, far left panel). Similar enrichments were observed for all fractions (Figure S1B). Hierarchical Clustering Analysis. Hierarchical clustering analysis was carried out to demonstrate the partitioning of proteins to their respective clusters, representative of the 4 different subcellular fractions. We found that nearly all of the proteins enriched in a particular fraction clustered together (Figure 2). Indeed, we found that 99% of the proteins specifically enriched in the cytosolic fraction clustered together using hierarchical clustering analysis. Similarly, 94%, 98%, and 99% of the proteins specifically enriched in the microsomal, mitochondrial matrix, and mitochondrial membrane fractions clustered together, respectively. Gene Ontology Enrichment. Protein clusters corresponding to cytosolic, microsomal, and mitochondrial fractions were analyzed by gene ontology (GO) term enrichment analysis. We analyzed each cluster for significantly enriched molecular function, biological process, and cellular component GO terms (Figure 2 and Supplemental Table S3A-D). For each cluster, we found a significant enrichment of GO terms that were appropriate for each respective fraction. Specifically, we found that the cytosolic protein cluster was enriched significantly in GO terms including cytosol, kinase activity, and carbohydrate metabolism. This enrichment is consistent with the cytoplasm being the site of action for many intracellular signaling kinases as well as glycolysis.23 The microsomal fraction was enriched in several molecular function, biological process, and cellular component GO terms including, calcium ion binding, protein biosynthesis, and endoplasmic reticulum. Since the microsomal fraction is composed of cellular organelles including the endoplasmic reticulum, which contain calcium binding proteins such as calsequestrin, and is essential for membrane bound and secretory protein synthesis, the latter GO terms are consistent with the microsomal fraction. The mitochondrial matrix cluster was enriched in several GO terms including oxidoreductase activity, fatty acid metabolism, and mitochondrial matrix, while the mitochondrial membrane cluster was enriched in several GO terms including electron carrier activity, oxidative phosphorylation, and mitochondrial membrane, again demonstrating consistency with mitochondrial fractions. Database Mining. To examine the specificity of the subcellular fractions, we searched the Swiss-Prot and TrEMBL Protein databases to assign subcellular location annotations for each identified protein. We found that 59% (2887/4906) of the categorized proteins had a subcellular location annotation in either database. Next, we performed a weighted analysis, rather than assign equal import for each protein, to avoid the bias for extremely low-abundance protein contaminants that do not accurately reflect the true protein composition of a particular fraction. Specifically, we compared the sum of spectral counts for all proteins allocated to a subcellular fraction with an annotation for that fraction, against the sum of spectral counts for proteins in that same fraction with other subcellular location annotations. For example, of all the spectra associated with proteins that were both enriched in the cytosol and that had a specific subcellular location annotation, we found that 78% of them were from proteins annotated as cytosolic, confirming that a majority of proteins found to be enriched in the cytoplasm were, in fact, cytoplasmic proteins (Figure 3A,D). Moreover, Journal of Proteome Research • Vol. 8, No. 4, 2009 1889

research articles

Bousette et al.

Figure 2. Hierarchical clustering of cardiac proteins with the major subcellular compartments. Shown is an unbiased self-organizing heat map demonstrating the segregation of proteins into 4 main clusters representing the 4 subcellular fractions. To the right are key “Biological Process” GO terms found to be significantly enriched in each respective cluster.

95% of the spectral counts for proteins enriched in the mitochondrial fractions (i.e., both mitochondrial matrix and membrane fractions) were associated with proteins that had mitochondrial annotations (Figure 3C,F). Proteins enriched in the microsomal fraction show that 54% of the spectra were associated with proteins having microsomal annotations (Figure 3B,E). The diverse nature of microsomal constituents may be contributory to the reduced specificity. Indeed, microsomal annotations included ‘endoplasmic reticulum’, ‘golgi apparatus’, ‘endosome’, ‘lysosome’, ‘vesicles’, ‘membrane’ (without ‘mitochondrial’), and ‘secreted’. However, all fractions considered, the majority of proteins were assigned to its biologically relevant fraction, thus, demonstrating consistency between our proteomic findings and the Swiss-Prot and TrEMBL protein databases. Immunoblotting Validation. To examine the specificity of fractionation of the cardiac tissue, we carried out immunoblot analyses against known subcellular markers. We selected subcellular markers including GAPDH for the cytosol, angiotensin converting enzyme (ACE) for the microsomes, pyruvate carboxylase for the mitochondrial matrix, and F1-ATP synthase 1890

Journal of Proteome Research • Vol. 8, No. 4, 2009

(β-subunit chain) for the mitochondrial membrane fraction (Figure 4). These markers were chosen because of their wellestablished roles in the respective cellular fractions, the availability of their antibodies, and because we have previously demonstrated their specificity as subcellular markers.21 Immunoblot analyses clearly demonstrate the appropriate subcellular location for these proteins as the immunoblot band is consistently found in the same fraction where spectral counts are most abundant. Proteome/Transcriptome Correlation. Here, we complemented our proteomic analysis of the mouse heart by correlating it to a microarray analysis of mouse cardiac ventricular tissue (Supplemental Table S4). We also compared this microarray data set with 3 other publicly available data sets, hereafter referred to as GSM126, GSM147, and GSM173 (Supplemental Table S5, http://www.ncbi.nlm.nih.gov/geo/). The latter transcriptomes were also produced with mouse cardiac tissue on the same technical platform, the Affymetrix GeneChip Mouse Genome 430 2.0 Array. Our cardiac microarray consisted of 16613 nonredundant Affymetrix probes with statistically significant intensity values as determined using the Robust

Murine Cardiac Proteome

research articles

Figure 3. Database searches. (A-C) Heat maps demonstrating the weighted clustering of protein annotations for cytosolic, microsomal, and mitochondrial fractions, respectively. More abundant proteins found with higher spectral counts are shown in blue, while proteins with lower spectral counts are shown in yellow with varying intensities of either color for those in between. (D-F) Bar graphs demonstrating annotation distributions for cytosolic, microsomal, and mitochondrial fractions. Each bar graph demonstrates the percentage of spectral counts associated with proteins that have annotations matching the fraction listed in the x-axis.

Figure 4. Immunoblotting validation. Cardiac ventricular samples were isolated, fractionated, and subjected to immunoblotting. (A) Immunoblots for glyceraldehyde 3-phosphate dehydrogenase (GAPDH), angiotensin converting enzyme (ACE), pyruvate carboxylase, and ATP synthase (β-chain). (B) Representation of spectral counts obtained for the latter proteins in each fraction, respectively. Spectra in bold represent the highest recorded spectra for that protein.

Multichip Analysis algorithm. Similarly, only probes associated with statistically significant detection values were accepted for each external microarray. Comparison of the microarray data to these external data sets resulted in very high overlap and exceptionally strong correlations (GSM126, r ) 0.954; GSM147, r ) 0.961; and GSM173, r ) 0.939, respectively), thus, clearly validating the

current transcriptome (Figure S2A-F). To rule out the possibility that these high correlations were strictly platform dependent, we also compared our microarray data set to another external data set derived from sperm proteins that was also analyzed on the same Affymetrix GeneChip. Correlation analysis with this latter biologically distinct data set showed a much lower correlation (r ) 0.561), thus, demonstrating that Journal of Proteome Research • Vol. 8, No. 4, 2009 1891

research articles

Bousette et al.

Figure 5. Proteome/transcriptome correlations. (A) Venn diagram representing the total (underlined) and common (italicized) UniGene accessions between the 4 transcriptomes and the current cardiac proteome (correlation coefficients for each comparison are in parentheses). (B) The scatter plot for the comparison of the current transcriptome to the cardiac proteome. (C) The residuals plot showing the deviation of observed from predicted values based on linear regression analysis. Inliers exhibited a standard deviation (SD) 2 SD (D) Scatter plot for the ‘inliers’ subset. (E) Scatter plot for the ‘outliers’ subset.

the former correlations are not strictly platform dependent (results not shown). Next, we aimed to compare our cardiac proteome with the microarray data sets. To compare these data sets, we converted all Affymetrix Probe accessions to UniGene accessions, which best circumvents the redundancy of multiple Affymetrix probes that represent one protein. The cardiac proteome was also converted from UniProt accessions to UniGene accessions so as to have an equivalent platform for comparison to the microarray data sets. We found that there was a reasonable overlap between the microarray (10 109 unique UniGene accessions) and the proteome (4628 unique UniGene accessions). Specifically, 2947 (60%) of the UniGene accessions that were identified in our cardiac proteome were also identified in the cardiac transcriptome (Figure 5A). This was associated with a moderate but highly significant correlation (r ) 0.557). The other three external microarray data sets also exhibited equitable coverage of our cardiac proteome. Specifically, GSM126, GSM147, and GSM173 had 2997 (61%), 2925 (60%), and 3113 (63%) UniGene accessions in common with our cardiac proteome, respectively (Figure 5A). These latter microarray data sets also exhibited equitable correlations with the cardiac proteome (GSM126, r ) 0.569; GSM147, r ) 0.560; and GSM173, r ) 0.566, respectively). In addition, due to the 1892

Journal of Proteome Research • Vol. 8, No. 4, 2009

potential for spurious correlation with such large data sets, we aimed to evaluate the degree of correlation between the cardiac proteome and the microarray data set when the values were randomized (data not shown). Indeed, randomization of values completely disrupted the correlation (mean r ) 0.029, 4 randomizations) again supporting the genuine nature of these correlations. Of note was the discovery that 2702 UniGene accessions were commonly found in the cardiac proteome and the four microarray data sets (Figure 5A). Conversion of the UniGene accessions back to 2836 UniProt accessions demonstrated that, although it represented only 58% of the cardiac proteome in protein number, it represented 87.3% of the spectral counts for all proteins. This indicates that a majority of the proteins not found in common between all data sets are likely proteins with low expression. Because of the inherent technical, bioinformatic, and biological difficulty in assigning correlations between mRNA and protein expression,24 we expected that many proteins would lack a concordance with the expression levels of their cognate mRNA. As such, we attempted to assess if there were any pronounced characteristics of proteins that exhibit discordant mRNA/protein ratios (Figure 5B). We determined the difference between observed and predicted values (i.e., residuals) for all

Murine Cardiac Proteome protein/transcript pairs by linear regression analysis. Statistical analysis of the residuals indicated a mean of 2.46 and a standard deviation (SD) of 1.94. We arbitrarily labeled all proteins with residuals less that 1 SD as being ‘inliers’, and those with residuals greater than two SD as being ‘outliers’ (Figure 5C). Interestingly, the correlation score for the inliers group representing 49% of the total protein/transcript matches was extremely high (r ) 0.915), while there was no correlation for those proteins deemed as outliers (r ) 0.147) (Figure 5D,E). To identify characteristics of this outlier subset, we performed a gene ontology enrichment analysis. Interestingly, we found that proteins which showed discordant mRNA/protein profiles (i.e., ‘outliers’) were significantly enriched in several GO terms related to mitochondrial energy metabolism including NADH dehydrogenase activity and cytochrome c oxidase activity, as well as in GO terms related to ribosomes including ribosome biogenesis and assembly, while no such GO terms were enriched in the inlier subset (Supplemental Table S6A,B). Together this suggests that several mitochondria and ribosomal proteins may be subject to substantial post-transcriptional and/or posttranslational regulation. Proteome/Proteome Correlation. We evaluated the strength of the present data by comparing it to other proteomic surveys previously published (see Supplemental Tables S7-S12).9,10,14-17 First, we compared the subset of proteins found in the mouse heart from the Kislinger et al. data set.9 The latter heart proteome consisted of 1133 mouse proteins, of which 601 (53%) were found to be in common with our data set. Because of similarities in experimental design, we were able to compare protein abundance in cytosolic, microsomal and mitochondrial fractions from both data sets. Correlation analysis demonstrated a high correlation between the expression profile for proteins of the cytosol (r ) 0.924), microsomes (r ) 0.643), and mitochondria (r ) 0.820) (Figure 6A,B). Next, we analyzed the cardiac segment of the proteomic study published recently by Cutillas et al.10 We identified 693 proteins with current and nonredundant UniProt accessions from their original data set of 941 proteins identified in the heart. Of the 693, we found that 479 (69%) of the proteins were in common with our data set. Because the Cutillas data set included an absolute quantification of protein abundance, we were able to perform correlation analysis with our data set. Indeed, we found a moderate-to-strong correlation (r ) 0.621) of the expression profiles for the proteins in common between the two data sets (Figure 6A,C). The comparisons to the Kislinger and Cutillas studies reveal the current proteomic study as a much more in-depth mouse heart proteome. Indeed, following removal of the proteins identified in common with the other cardiac proteomes, our data contains 4054 proteins not identified in the other surveys (Figure 6D). Conversely, Kislinger et al. and Cutillas et al. have 519 and 201 proteins specific to their data set, respectively. We compared our proteomic fractions to others who have surveyed individual organelles, including mouse brain cytosol (Shin et al.14), the rat liver microsomes (Gilchrist et al.15), and the mitochondria from a various murine tissues (Mootha et al.16) including the heart (Zhang et al.17). Interestingly, we found a high degree of homology between our data set and that of the four other proteomic surveys (Table 1 and Figure 6A). Specifically, we found that 516 (66%) of the proteins identified in the Shin et al. cytosolic data set were also found in our complete data set. Furthermore, 477 (92.4%) of the latter proteins were likewise found in the cytosol in our data set.

research articles Similarly, we retrieved 837 mouse orthologs from 1137 rat liver microsomal proteins identified by Gilchrist et al.15 Of these 837, 481 were present in our data set, with 421 (87.5%) of them also found in the microsomal fraction. From the 336 mitochondrial proteins identified by Mootha et al.,16 222 (66%) were also present in our data set. Of these 222 proteins, 211 (95.0%) were also identified in our mitochondrial fractions. Finally, we also compared our cardiac proteome to the recently published murine cardiac mitochondrial proteome in which Zhang et al.17 identified 940 proteins with IPI accessions. Conversion of these IPIs to UniProt accessions identified 674 proteins that were in common between these data sets. Of these 674 proteins, 656 (97.3%) were likewise identified in our mitochondrial fractions. Transmembrane Helix Predictions. One of the advantages of gel-free proteomics is the ability to identify membrane proteins.25 Accordingly, we utilized the transmembrane helix prediction tool (TMHMM 2.0) to identify proteins within our data set that contain a predicted transmembrane domain and are, thus, likely to be membrane proteins. We found that 815 (17%) of the proteins in our data set contained at least one predicted transmembrane helix (TMH), while 360 contained two or more TMHs. In particular, 567 (69%) of the proteins with TMHs were allocated to either the microsomal or mitochondrial membrane fractions, consistent with fractions that should contain membrane proteins. More importantly, however, we found that 80% of the spectra for the 815 proteins with TMHs were from proteins of either the microsomal or mitochondrial membrane fractions, indicating a high relative specificity of the membrane containing fractions for TMH bearing proteins (Figure 6A). Identification of Poorly Characterized and/or Unannotated Proteins. We found that a large subset of this cardiac proteome was unannotated with respect to subcellular location. Specifically, we found that 2019 of the 4906 proteins lacked a specific “SUBCELLULAR LOCATION” annotation in the SwissProt and TREMBL protein databases. Therefore, our location predictions in the current proteome would be the first largescale location annotation for this protein data set. Although we labeled these proteins as ‘unannotated’ because they lack a bioinformatic subcellular location annotation in the latter protein databases, a small portion of these proteins are actually well-known proteins, such as Myoglobin and Aconitase. These were nevertheless included in the unannotated subset because they fit the criteria of lacking a specific subcellular location annotation. This grouping notwithstanding, a significant proportion of the proteins in this ‘unannotated’ subset are poorly characterized. We performed hierarchical clustering analysis to demonstrate the segregation of this protein subset to their respective clusters representing the 4 subcellular fractions (Figure 7A). To validate our allocation of proteins to specific subcellular locations, we performed various database searches to link the 200 most abundant unannotated proteins to a particular fraction. Specifically, we searched for supportive evidence that a protein which was allocated to a particular fraction was actually a resident of that fraction. Supportive evidence included locations cited in the literature, annotations in the Human Protein Atlas, cellular component GO terms and documented associations with proteins and/or protein pathways of known subcellular location (Figure 7B and Supplemental Table S13). Since the fraction where sarcomeric proteins would be found was not analyzed, we considered sarcomeric proteins as contaminants, and Journal of Proteome Research • Vol. 8, No. 4, 2009 1893

research articles

Bousette et al.

Figure 6. Proteome comparisons. (A) Heat maps demonstrating (from left to right) the clustering of proteins in the current data set (i.e., n ) 4906); proteins found in common between the current data set and Kislinger et al.9 (arrayed by subcellular location); proteins in common between the current data set and that of Cutillas et al.;10 proteins found in common between the current data set and the data sets of Shin et al.,14 Gilchrist et al.,15 Mootha et al.,16 and Zhang et al.;17 proteins with predicted transmembrane helices (TMHs). (B) Graph demonstrating the correlation of spectral counts for cytosolic, microsomal, and mitochondrial proteins in common with the Kislinger et al. data set. (C) Graph demonstrating the correlation of our spectral counts with percent of absolute proteins values, for proteins in common with the Cutillas et al. data set. (D) A Venn diagram demonstrating the number of proteins that were specific to each proteomic survey (underlined) and those that were in common between each survey (italicized). Table 1. Table Demonstrating Number and Percentage of Proteins Found in Common between the Current Cardiac Proteome and Various Organellar Proteomes proteome

no. of proteins in data set

no. of proteins found in our proteome

no. of proteins found in the fraction of interest

no. of proteins enriched in fraction of interest

Shin et al. Gilchrist et al. Mootha et al. Zhang et al.

789 837 336 940

516 481 222 674

477 (92.4%) 421 (87.5%) 211 (95.0%) 656 (97.3%)

311 (60.3%) 187 (38.9%) 179 (80.6%) 447 (66.3%)

subtracted them from our analysis. Highly abundant blood proteins, including hemoglobin, were likewise subtracted. We found that 13 proteins (6.5%) were sarcomeric or blood protein contaminants and were disregarded. In addition, we found that there was no obvious evidence of subcellular location for 60 (30%) of the 200 most abundant unannotated proteins. In addition, 23 proteins were ribosomal proteins, which would 1894

Journal of Proteome Research • Vol. 8, No. 4, 2009

be expected to be enriched in both the cytoplasmic and microsomal fractions and thus cannot be correctly allocated to one particular fraction. Interestingly, 88.4% of the remaining proteins had evidence of originating from the same fraction in which they were found to be enriched in the current study and were thus deemed as ‘Consistent’. By contrast, 11.6% of the remaining proteins showed evidence for subcellular location

Murine Cardiac Proteome

research articles

Figure 7. Analyses of unannotated proteins. (A) Heat map demonstrating the segregation of unannotated proteins into 4 main clusters representing the 4 subcellular fractions. (B) Pie graph demonstrating the distribution of 200 most abundant proteins as ‘Consistent’, demonstrating that there is evidence to support the fact that the fraction in which these proteins were enriched is also the their fraction of origin; ‘Inconsistent’, demonstrating that there is evidence to contradict the fact that the fraction in which these proteins were enriched is also the their fraction of origin; ‘Unknown Location’, indicating evidence was not found to describe their subcellular location; ‘Ribosomal’, indicating that the protein is a ribosomal protein and thus cannot be allocated to one specific fraction; or ‘Contaminant’, indicating that the protein is a blood or sarcomeric protein and thus is disregarded from the analysis.

that was in conflict with the fraction in which they were enriched, and were thus marked as ‘Inconsistent’. To examine further the subset of poorly annotated proteins, we mined the recently published confocal proteome,26 which is accessible through the human protein atlas (www.proteinatlas.org). We searched this confocal image database for matches of the 200 most abundant ‘unannotated’ proteins. We found 6 proteins matched exactly by name, while another 9 proteins were matched to either an isoform or a kindred subunit in this confocal subproteome. Confocal immunofluorescence images for these proteins were collected (Figure 8 and Supplemental Figure S3). Here we show that the assignment to a subcellular compartment was confirmed by confocal microscopy. Specifically, adenylosuccinate lyase had cytoplasmic immunoreactivity, proteasome subunit beta type 4 shows microsomal immunoreactivity, while isocitrate dehydrogenase was present in the mitochondria which are all consistent with the proteomic subcellular localization assigned to them (Figure 8 and Supplemental Table S14). Likewise, the subcellular location as determined by confocal immunofluorescence for Aconitase, Eukaryotic translation initiation factor 5A-1, Fibrinogen beta chain precursor, Formyltetrahydrofolate synthetase, Fructose-bisphosphate aldolase A, Glycogen phosphorylase,

Glycogenin-2, MajorVaultProtein, Myoglobin, NADH dehydrogenase [ubiquinone] 1 beta subcomplex subunit, Proteasome activator complex subunit 1, and Ubiquitin carboxyl-terminal hydrolase 5 was in agreement with the subcellular locations predicted by our proteomic analysis, thus, further increasing the confidence of the current cardiac proteome (Supplemental Figure S3 and Table S14). Another 3 proteins (Vimentin, 26S proteasome non-ATPase regulatory subunit 1 and subunit 2) were matched that showed immuno-localization that was not in agreement with our proteomic findings; however, these proteins were already grouped as ‘inconsistent’ based on other evidence and hence the latter images were not included. Identification of Cardiac Selective Factors. We were interested in determining which of the proteins of the current proteome are found most abundantly in the heart and are therefore cardiac enriched proteins. We selected transcriptomes profiling murine heart, skeletal muscle, brain, lung, liver, kidney and testis. While not covering every possible tissue, we did select several major organs which form the premise for generating cardiac selectivity. Furthermore, all of the latter transcriptomes were analyzed with the identical study platform as our transcriptome, the AffyMetrix GeneChip 430 2.0. Journal of Proteome Research • Vol. 8, No. 4, 2009 1895

research articles

Bousette et al.

Figure 8. Confocal immunofluorescence images obtained from the Human Protein Atlas. Immunofluorescence (A, E, I), cytoskeletalphalloidin staining (B, F, J), nuclear Dapi staining (C, G, K), and the composite images showing integrated fluorescence (D, H, L). (A-D) A-431 cells were stained to detect endogenous levels of Adenylosuccinate lyase (accession PUR8_MOUSE) with cytoplasmic immunoreactivity. (E-H) Proteasome subunit beta type 4 (accession PSB5_MOUSE) showing microsomal immunoreactivity (i.e., annotated as “vesicles”). (I-L) Isocitrate Dehydrogenase (accession Q91VA7_MOUSE) with mitochondrial immunoreactivity.

For this analysis, we compared the entire transcriptome consisting of 45 101 affymetrix probes for each mouse tissue. Interestingly, ∼10 000 probes exhibited expression intensity below the threshold for all tissues and were thus discarded from further analysis. Correlation analyses demonstrates that our cardiac transcriptome was most similar to the other cardiac transcriptome (r ) 0.91) and skeletal muscle transcriptome (r ) 0.75; Figure 9A). As expected, the other tissues showed drastically reduced correlation of expression intensity against our cardiac transcriptome (r ) 0.30-0.62). To identify transcripts that were selectively enriched in the heart, we performed hierarchical clustering analysis for all transcriptomes (Figure 9B). We found 950 probes in our cardiac transcriptome that were at least 10-fold greater over the average expression of the other six noncardiac tissues. Cross-mapping these latter probes to the current proteome led to the identification of 166 proteins (Supplemental Table S15). Of note, several well-established cardiac proteins were found to be included in this subset of cardiac enriched proteins including SERCA2A, atrial natriuretic factor, as well as the cardiac muscle isoform of calsequestrin. GO term analysis for the 166 proteins identified several biological function terms related to cardiac 1896

Journal of Proteome Research • Vol. 8, No. 4, 2009

physiology including muscle contraction, circulation, regulation of heart contraction, and cardiac inotropy. Our next aim was to confirm the heart specificity of these 166 cardiac selective factors. Therefore, we determined mRNA expression levels in the heart, skeletal muscle, brain, lung, liver, kidney and testis for 7 randomly chosen genes that have not been previously characterized as cardiac genes. Specifically, we assayed Growth arrest-specific protein 6, PTRF/SDPR family protein, LIM and calponin homology domains-containing protein 1, Leucine-rich repeat-containing protein 2, Cysteine and glycine-rich protein 3, PRELI domain-containing protein 2 and a novel gap junction protein. In addition, we assayed the cardiac isoform of calsequestrin as a control. Indeed genes exhibited substantially less expression levels in all tissues relative to the heart (Figure 9C). Specifically, the latter genes exhibited from ∼3- to >10-fold lower expression in the noncardiac tissues compared to the heart. Proteotypic Peptide Subset. Here, we aimed to identify the best candidate peptides for each protein. Therefore, we filtered our original list of total peptides (i.e., 367 079 peptides) to remove any peptide with a charge of less than 2 and an Xcorr score of less than 2.2. We then removed all peptides that

Murine Cardiac Proteome

research articles

Figure 9. Cardiac selective factors. (A) Graph demonstrating the Pearson correlation coefficients for microarray expression intensities for genes across tissues. (B) A heat map representing the relative expression intensities for genes across several tissues as determined by microarray analysis. (C) mRNA expression levels of genes (listed by their UniProt IDs) across tissues relative to the mRNA expression in cardiac tissue as determined by real-time RT-PCR analysis. Genes were normalized to the mRNA levels of the ribosomal gene, RPL34.

contained either methionine or cysteine residues due to their susceptibility for oxidative modification or alkylation, respectively. The resultant 5430 peptides were then scored based on an algorithm taking into consideration three different criteria that can be used to grade the proteotypic potential for each peptide. The first criterion was relative abundance, determined by the ratio of the number of times a particular peptide was identified vs the total number of peptides used to identify the

cognate protein. The second criterion considered the presence/ absence of a peptide compared to the presence/absence of the cognate protein across experimental runs. The final criterion was a measure of the overall propensity for detection of the peptide in question (quantified as spectral counts), which we transformed using a logarithmic scale to moderate the effect of this wide ranging variable (Supplementary Figure S4A). As a result, 5430 potential proteotypic peptides for 2272 proteins Journal of Proteome Research • Vol. 8, No. 4, 2009 1897

research articles were identified and are presented as a ranked list (Supplementary Figure S4B and Table S16A), and as a list of best scoring peptide for each of the proteins (Supplementary Figure S4C and Table S16B). To identify potential human orthologs, we then searched all unique peptides identified from the current proteomic survey against an EBI human FASTA file (created in March 2008). This resulted in a total of 8074 mouse peptides that mapped to human proteins. We matched 2188 of the latter peptides to our proteotypic peptide list, with 1875 of those mapping to 927 unique orthologous mouse/human proteins (Supplementary Table S16C). Therefore, these 1875 peptides represent potential peptides for mass spectrometry assays involving both human and mouse tissue.

Discussion In the present study, we analyzed data collected from a subcellular fractionation protocol and a well-established MudPIT proteomic analysis to produce a comprehensive proteome of the mouse heart. This cardiac proteome consists of 4906 proteins characterized by subcellular location and relative abundance. The power of this study resides in the potential of this data set to be a current and accurate resource for the study of cardiac related diseases. We initially segregated most proteins to one of four cellular compartments including the cytosol, microsomes, mitochondrial matrix and mitochondrial membrane, based on which fraction contained the greatest number of spectral counts for each particular protein. Although not as precise as other methods of quantitative mass spectrometry, this semiquantitative approach of spectral counting enables us to have a rough estimate of relative protein abundance, but in no way is an indicator of absolute protein abundances. Nonetheless, this simple algorithm proved to be accurate in light of the multiple data validations performed. Specifically, the subcellular allocations were compared, contrasted, and validated against protein database annotations as well as numerous other data sets including those of other proteomic surveys. Importantly, our data set demonstrates minimal bias against protein size as evidenced by the strong correlation between the MW distribution of our proteomic data set and the MW distribution of the ∼15 000 mouse proteins listed in the Swiss-Prot database. These comparative analyses with public databases as well as previously published studies consistently demonstrate significant correlations with our data set. Subcellular Localizations. We first aimed to examine our subcellular allocations using the publicly available Swiss-Prot and TREMBL protein databases. These databases were searched for subcellular location annotations. We found a high correlation between our subcellular allocations and the annotations listed in the databases for cytosolic and mitochondrial fractions, while it was lower for that of the microsomal fraction. These results substantiate the proteomic analysis presented here by demonstrating that the majority of the annotated proteins were found in the expected subcellular location. The lower correlation for subcellular annotations in the microsomes is likely due to the fractionation procedure, the poor database annotations of that fraction, as well as the dynamic nature of this fraction. For instance, the microsomes are composed of organelles such as the Golgi, ER, vesicles, and plasma membrane. In addition, several protein complexes tend to be large and would be expected to be isolated with the microsomal fraction due to their dense and/or insoluble nature. Also, the presence of cytosolic proteins in the microsomal 1898

Journal of Proteome Research • Vol. 8, No. 4, 2009

Bousette et al. fraction might be due to the strong noncovalent associations of cytosolic proteins with membranes, not to mention those that are anchored to the membrane by such posttranslational modifications such as myristlyation.27 Finally, the SR/ER and mitochondria are both functionally and spatially linked.28,29 Thus, these two organelles may be physically linked and true isolation of either organelle is challenging. Therefore, when assigning subcellular locations, the physical nature of the protein must also be considered. However, future studies may require more stringent biochemical and mechanical fractionation procedures to improve the purity of cell fractions such as those employed to analyze the cardiac mitochondrial proteome.17 We compared the subcellular predictions of the current proteome with those listed in the Swiss-Prot and TREMBL databases because they are the most complete and most accessible resources for protein data. This fact notwithstanding, the database has a very high degree of ambiguity and loosely based data annotations. Indeed, a very large number of protein annotations are “probable” or indicated “by similarity”. Immunoblotting Validation. Our next aim was to interrogate our data set using immunoblot analysis. We selected four organellar markers including GAPDH for the cytosol, ACE for the microsomal fraction, pyruvate carboxylase for the mitochondrial matrix, and ATP synthase for the mitochondrial membrane fraction. In each case, the immunoblotting analysis demonstrated high specificity of the protein in the expected fraction. Furthermore, there was a significant correlation between the immunoblots and spectral counts for all organellar markers. It may be argued that the proteomic analysis was not specific as there were relatively high spectral counts for these organellar markers in more than 1 subcellular fraction. This discrepancy is a common problem for high-abundance proteins isolated from complex protein mixtures, which due to their high levels, readily contaminate all fractions during the fractionation procedure,22 and are detected by the high sensitivity of today’s mass spectrometers compared to the relatively lower sensitivity of immunoblotting and the nonlinear nature of chemiluminescent signal detection. Microarray Comparisons. Microarray analysis is a powerful tool that is commonly utilized today to describe global mRNA expression patterns.30,31 Here, we examined cardiac microarray data and show a high degree of correlation (r ) 0.94-0.96) with three other distinct mouse cardiac transcriptomes. Interestingly, we ruled out the potential that the correlation was strictly platform dependent by showing a drastic reduction in correlation score when our cardiac transcriptome was compared to a sperm transcriptome which was produced using the same technological platform (i.e., the same AffyMetrix GeneChip). We then compared our cardiac proteome to the 4 cardiac transcriptomes. Logically, it can be appreciated that the presence of a protein is dependent on the presence of its mRNA; however, the reverse is not true. Furthermore, there are many post-transcriptional and post-translational events that can alter the presence of either the transcript or the protein itself. This notwithstanding, it is not unprecedented to find a concurrence between mRNA and protein levels using both proteomic and microarray analytical platforms.13,32-36 Bearing this in mind, we aimed to determine if a concordance exists on a global level within our proteomic and microarray data sets. Indeed, analysis between mRNA abundance and protein spectral counts demonstrated a moderate but highly significant correlation between our cardiac proteome and the 4 cardiac transcriptomes.

Murine Cardiac Proteome We also identified the ‘inliers’ and ‘outliers’ from the total set of common protein/transcripts. Interestingly, we found that there were a relatively high number of proteins with matched transcript expression levels. Indeed, this ‘inliers’ subset, which represented ∼50% of the total had a very high correlation of protein/mRNA abundance, while the ‘outliers’ subset, which was comparably small (∼20%), completely lacked any abundance correlation. Of note, through GO term enrichment analysis we found that this ‘outlier’ set contained many proteins involved in mitochondrial energy production and ribosome synthesis indicating that in the heart, proteins related to the latter two functions have a high degree of post-transcriptional and/or post-translational regulation. Interestingly, we found that NADH dehydrogenase subunits exhibited much lower protein levels than expected based on the transcript levels. Consistent with this finding, Bai et al. found that transcript levels for NADH dehydrogenase were far in excess of NADH dehydrogenase protein levels in WT mice.37 Therefore, the discrepancies observed in the current study between protein and transcript levels have a biological precedence. However, we cannot rule out that at least some of the outliers are the result of technical limitations, especially for low-abundance proteins. Proteome Comparisons. Comparison of our data to other proteomic data sets is inherently more appropriate; however, many caveats must still be considered. This includes, but is not limited to, protein source and analytical methods. Indeed, protein source can vary by species, tissue, organ, cell, and organelle, while different analytical platforms can introduce study variables such as different selection biases for proteins. Nevertheless, comparison of different proteomic studies can be an informative analysis; therefore, we evaluated the similarities and differences between our mouse heart proteomic survey and others including two other mouse heart surveys,9,10 as well as organellar surveys of the cytosol, isolated from the mouse brain,14 microsomes, isolated from rat liver,15 and mitochondria, isolated from mouse brain, heart, kidney, and liver,16 and from mouse heart.17 We found high correlations between our data set and that of Kislinger et al.,9 and Cutillas et al.10 These strong correlations not only validate our data against other published proteomes, but also demonstrate a high degree of reproducibility in proteomic detection, especially with the use of similar analytical techniques and data handling. Our proteomic data set is noticeably more penetrating (4906 proteins) compared to that of both Kislinger et al. (1133 proteins) and Cutillas et al. (941 proteins). This large discrepancy regarding proteomic depth is likely due to technical differences in mass spectrometry and experimental design. Indeed, the data for this study was generated on an LTQ linear ion trap mass spectrometer, while Kislinger et al. utilized an LCQ DECA XP ion trap mass spectrometer. The LTQ has been shown to be superior with respect to scan speed and sensitivity, resulting in more than 5-fold greater protein identifications.38 In addition, we subfractionated the cardiac tissue into subcellular fractions, thus, reducing sample complexity and increasing protein coverage, whereas Cutillas et al. did not carry out this step. However, we did not identify all the proteins found in their studies. The stochastic nature of proteomic analyses coupled with the fact that we utilized different mouse strains and different analytical instruments may be the cause for this limitation in our study. There are a multitude of study variables that can also account for the lack of perfect correlations between our findings and

research articles those of others including the use of different species (i.e., Gilchrist et al. utilized the rat), tissue source (i.e., Mootha et al. utilized mitochondria from four different organs), and analytical platforms (i.e., Shin et al. utilized 2-DE). However, we still demonstrate a very high degree of mutual proteins in matched proteomic surveys. More importantly, we show that a majority of those proteins that were identified in common between our data set and the latter organellar proteomic surveys were identified in the identical subcellular locations. Of note, because we surveyed several subcellular fractions simultaneously, we can assign proteins as enriched in certain fractions. Therefore, we can assign even greater confidence in the proteins that were enriched in a particular fraction and were likewise matched in the respective organellar survey. As such, our proteome can serve to focus future searches on a higher confidence subset of organellar enriched proteins (Table 1). In addition to comparison of our data to others, we also utilized a transmembrane domain prediction tool to identify proteins with TMHs. We found that 815 proteins contained 1 or more TMHs, arguing against a selection bias for nonmembrane proteins within this proteome. Furthermore, a majority of proteins with predicted TMHs were found in membrane containing fractions, thus, further validating our subcellular allocation algorithm. ‘Unannotated’ Data Set Characterization/Validation. Importantly, 41% of the proteins, corresponding to 19% of the total spectra, lacked a specific subcellular location annotation in the Swiss-Prot and TrEMBL protein databases. Therefore, we aimed to characterize this poorly annotated fraction. Specifically, we found that this subset of proteins clustered to the 4 subcellular fractions similarly to the entire data set. Because of the large number of unannotated proteins identified, we could not verify the subcellular location by means of GFP-tagging or other immunological methods. Instead, this was carried out in an in silico analysis by means of searching various sources for supportive evidence of subcellular location for the 200 most abundant nonannotated proteins. Disregarding protein contaminants (hemoglobin isoforms and sarcomeric proteins), ribosomal proteins, and proteins which could not be linked to a particular subcellular location, we found that 88.4% of the remaining proteins could be linked with evidence indicating they were in fact from the fraction in which they were enriched. Importantly, 11 of the 12 proteins that had evidence of being from a subcellular location which was distinct from the fraction in which they were enriched, exhibited physical properties which could explain the discrepancy in fractional enrichment. Indeed, the 11 discordant proteins were either part of the proteosome complex, the translational apparatus, the cytoskeleton, or had a very large mass (i.e., 310 kDa). Therefore, these proteins have physical characteristics which can result in their isolation in the nonnative fractions. We also searched the recently published confocal proteome for proteins that matched our unannotated list. Of the 200 proteins for which we searched, 18 could be matched in the latter database, either directly or by isoform/subunit linkage. Fifteen of the latter proteins exhibited immuno-localization that matched our proteomic survey. Therefore, we demonstrate that the subcellular location of the majority of unannotated proteins identified in the current study are supported by both biological and bioinformatics evidence. It is not unexpected that we found less than 10% of the proteins for which we searched in the Journal of Proteome Research • Vol. 8, No. 4, 2009 1899

research articles confocal proteome, considering the relatively poor characterization of this ‘unannotated’ list. Altogether, this subset can be regarded as a novel data source that will enhance the characterization and validation of future proteomic studies. Furthermore, the large number of poorly characterized proteins in this subset may be investigated as novel targets in cardiac disease. For instance, this data set could be utilized to get a basic idea of the subcellular location and relative abundance of poorly characterized proteins, which could greatly aid in further specific characterizations. This demonstrates the considerable degree of novel data in this analysis of the mouse cardiac proteome, and can be considered the first large scale report of subcellular location and relative abundance for over 2000 mouse cardiac proteins. Cardiac Selectivity. Tissue specificity can be an important characteristic of proteins, and as such, we were interested in identifying cardiac specific or enriched proteins within our cardiac proteome. To do this, we carried out a comparative transcriptomic approach in which we evaluated the expression of genes across a range of tissues using publicly available microarray databases. From this analysis, we identified a large subset of cardiac ‘selective’ genes that were then cross-mapped to 166 proteins in our current cardiac proteome. Not only did this latter protein subset contained several established cardiac markers, but GO term analysis also demonstrated the cardiac nature of these proteins. To perform a validation of the cardiac enrichment of these factors, we analyzed some cognate genes by RT-PCR analysis to evaluate their expression in an assortment of mouse tissues. Interestingly, the randomly selected genes exhibited a pronounced enrichment of mRNA expression in the heart compared to skeletal muscle, brain, lung, liver, kidneys, and testis. However, one gene, Leucinerich repeat-containing protein 2, exhibited comparable expression levels in the skeletal muscle and heart, which is not surprising considering the similar physiology of these two tissues. Therefore, these cardiac ‘selective’ factors can be used as a basis for future investigation into cardiac biology. Proteotypic Peptides. Technological development of proteomic techniques is increasingly supporting the advent of targeted searches of specific peptides. Such targeted searches require the detailed knowledge of representative, or proteotypic peptides for specific proteins. Therefore, we aimed to identify proteotypic peptides from our total set of high confidence peptides. With the use of widely accepted filtering criteria, we identified 5430 peptide candidates that were then graded using an algorithm measuring abundance and observation frequency to rank the proteotypic potential of each peptide. Therefore, the data presented here can be used as a resource for identifying top candidate peptides in future targeted searches of mouse and human cardiac proteins.

Acknowledgment. We would like to thank Mahima Agochiya, Brian Cox, and Igor Jurisca for data analysis. This study was supported by the Heart and Stroke Foundation of Ontario (#T-6281), Canadian Institutes of Health Research (MOP-84267), Genome Canada through the Ontario Genomics Institute, Canadian Foundation for Innovation, Connaught Foundation, and the Heart and Stroke/Richard Lewar Centre of Cardiovascular Excellence. N.B. is a Fellow of the Heart and Stroke Foundation of Canada. A.O.G. and T.K. are Canada Research Chairs, and A.O.G. is a New Investigator of the Heart and Stroke Foundation of Canada. A.E. is the Ontario Chair in Biomarkers. 1900

Journal of Proteome Research • Vol. 8, No. 4, 2009

Bousette et al.

Supporting Information Available: Methods section, supplemental figures, and supplemental Tables S1-S16. This material is available free of charge via the Internet at http:// pubs.acs.org. References (1) Yusuf, S.; Sleight, P.; Pogue, J.; Bosch, J.; Davies, R.; Dagenais, G. Effects of an angiotensin-converting-enzyme inhibitor, ramipril, on cardiovascular events in high-risk patients. The Heart Outcomes Prevention Evaluation Study Investigators. N. Engl. J. Med. 2000, 342 (3), 145–153. (2) Hansteen, V.; Moinichen, E.; Lorentsen, E.; Andersen, A.; Strom, O.; Soiland, K.; Dyrbekk, D.; Refsum, A. M.; Tromsdal, A.; Knudsen, K.; Eika, C.; Bakken, J., Jr.; Smith, P.; Hoff, P. I. One year’s treatment with propranolol after myocardial infarction: preliminary report of Norwegian multicentre trial. Br. Med. J. 1982, 284 (6310), 155– 160. (3) Kasama, S.; Toyama, T.; Hatori, T.; Sumino, H.; Kumakura, H.; Takayama, Y.; Ichikawa, S.; Suzuki, T.; Kurabayashi, M. Effects of torasemide on cardiac sympathetic nerve activity and left ventricular remodelling in patients with congestive heart failure. Heart 2006, 92 (10), 1434–1440. (4) Ho, K. K.; Anderson, K. M.; Kannel, W. B.; Grossman, W.; Levy, D. Survival after the onset of congestive heart failure in Framingham Heart Study subjects. Circulation 1993, 88 (1), 107–115. (5) Kung, H.-C.; Hoyert, D. L.; Xu, J.; Murphy, S. L. Deaths: Final Data for 2005. Natl. Vital Stat. Rep. 2008, 56 (10), 1–121. (6) Rosamond, W.; Flegal, K.; Furie, K.; Go, A.; Greenlund, K.; Haase, N.; Hailpern, S. M.; Ho, M.; Howard, V.; Kissela, B.; Kittner, S.; Lloyd-Jones, D.; McDermott, M.; Meigs, J.; Moy, C.; Nichol, G.; O’Donnell, C.; Roger, V.; Sorlie, P.; Steinberger, J.; Thom, T.; Wilson, M.; Hong, Y. Heart disease and stroke statisticss2008 update: a report from the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. Circulation 2008, 117 (4), e25– 146. (7) Omenn, G. S. Exploring the human plasma proteome. Proteomics 2005, 5 (13), 3223–3225. (8) Hood, B. L.; Zhou, M.; Chan, K. C.; Lucas, D. A.; Kim, G. J.; Issaq, H. J.; Veenstra, T. D.; Conrads, T. P. Investigation of the mouse serum proteome. J. Proteome Res. 2005, 4 (5), 1561–1568. (9) Kislinger, T.; Cox, B.; Kannan, A.; Chung, C.; Hu, P.; Ignatchenko, A.; Scott, M. S.; Gramolini, A. O.; Morris, Q.; Hallett, M. T.; Rossant, J.; Hughes, T. R.; Frey, B.; Emili, A. Global survey of organ and organelle protein expression in mouse: combined proteomic and transcriptomic profiling. Cell 2006, 125 (1), 173–186. (10) Cutillas, P. R.; Vanhaesebroeck, B. Quantitative profile of five murine core proteomes using label-free functional proteomics. Mol. Cell. Proteomics 2007, 6 (9), 1560–1573. (11) Shi, R.; Kumar, C.; Zougman, A.; Zhang, Y.; Podtelejnikov, A.; Cox, J.; Wisniewski, J. R.; Mann, M. Analysis of the mouse liver proteome using advanced mass spectrometry. J. Proteome Res. 2007, 6 (8), 2963–2972. (12) Van Hoof, D.; Mummery, C. L.; Heck, A. J.; Krijgsveld, J. Embryonic stem cell proteomics. Expert Rev. Proteomics 2006, 3 (4), 427–437. (13) Graumann, J.; Hubner, N. C.; Kim, J. B.; Ko, K.; Moser, M.; Kumar, C.; Cox, J.; Schoeler, H.; Mann, M. SILAC-labeling and proteome quantitation of mouse embryonic stem cells to a depth of 5111 proteins. Mol. Cell. Proteomics 2008, 7, 672–683. (14) Shin, J. H.; Krapfenbauer, K.; Lubec, G. Large-scale identification of cytosolic mouse brain proteins by chromatographic prefractionation. Electrophoresis 2006, 27 (13), 2799–2813. (15) Gilchrist, A.; Au, C. E.; Hiding, J.; Bell, A. W.; Fernandez-Rodriguez, J.; Lesimple, S.; Nagaya, H.; Roy, L.; Gosline, S. J.; Hallett, M.; Paiement, J.; Kearney, R. E.; Nilsson, T.; Bergeron, J. J. Quantitative proteomics analysis of the secretory pathway. Cell 2006, 127 (6), 1265–1281. (16) Mootha, V. K.; Bunkenborg, J.; Olsen, J. V.; Hjerrild, M.; Wisniewski, J. R.; Stahl, E.; Bolouri, M. S.; Ray, H. N.; Sihag, S.; Kamal, M.; Patterson, N.; Lander, E. S.; Mann, M. Integrated analysis of protein composition, tissue diversity, and gene regulation in mouse mitochondria. Cell 2003, 115 (5), 629–640. (17) Zhang, J.; Li, X.; Mueller, M.; Wang, Y.; Zong, C.; Deng, N.; Vondriska, T. M.; Liem, D. A.; Yang, J. I.; Korge, P.; Honda, H.; Weiss, J. N.; Apweiler, R.; Ping, P. Systematic characterization of the murine mitochondrial proteome using functionally validated cardiac mitochondria. Proteomics 2008, 8 (8), 1564–1575. (18) Schirmer, E. C.; Florens, L.; Guan, T.; Yates, J. R., 3rd; Gerace, L. Nuclear membrane proteins with potential disease links found by subtractive proteomics. Science 2003, 301 (5638), 1380–1382.

research articles

Murine Cardiac Proteome (19) Andersen, J. S.; Lam, Y. W.; Leung, A. K.; Ong, S. E.; Lyon, C. E.; Lamond, A. I.; Mann, M. Nucleolar proteome dynamics. Nature 2005, 433 (7021), 77–83. (20) Gramolini, A. O.; Kislinger, T.; Alikhani-Koopaei, R.; Fong, V.; Thompson, N. J.; Isserlin, R.; Sharma, P.; Oudit, G. Y.; Trivieri, M. G.; Fagan, A.; Kannan, A.; Higgins, D. G.; Huedig, H.; Hess, G.; Arab, S.; Seidman, J. G.; Seidman, C. E.; Frey, B.; Perry, M.; Backx, P. H.; Liu, P. P.; MacLennan, D. H.; Emili, A. Comparative proteomics profiling of a phospholamban mutant mouse model of dilated cardiomyopathy reveals progressive intracellular stress responses. Mol. Cell. Proteomics 2008, 7 (3), 519–533. (21) Gramolini, A. O.; Kislinger, T.; Liu, P.; MacLennan, D. H.; Emili, A. Analyzing the cardiac muscle proteome by liquid chromatography-mass spectrometry-based expression proteomics. Methods Mol. Biol. 2007, 357, 15–31. (22) Kislinger, T.; Gramolini, A. O.; MacLennan, D. H.; Emili, A. Multidimensional protein identification technology (MudPIT): technical overview of a profiling method optimized for the comprehensive proteomic investigation of normal and diseased heart tissue. J. Am. Soc. Mass Spectrom. 2005, 16 (8), 1207–1220. (23) Lea, M. A.; Walker, D. G. Factors affecting hepatic glycolysis and some changes that occur during development. Biochem. J. 1965, 94, 655–665. (24) Unwin, R. D.; Whetton, A. D. Systematic proteome and transcriptome analysis of stem cell populations. Cell Cycle 2006, 5 (15), 1587–1591. (25) Klein, C.; Garcia-Rizo, C.; Bisle, B.; Scheffer, B.; Zischka, H.; Pfeiffer, F.; Siedler, F.; Oesterhelt, D. The membrane proteome of Halobacterium salinarum. Proteomics 2005, 5 (1), 180–197. (26) Barbe, L.; Lundberg, E.; Oksvold, P.; Stenius, A.; Lewin, E.; Bjorling, E.; Asplund, A.; Ponten, F.; Brismar, H.; Uhlen, M.; AnderssonSvahn, H. Toward a confocal subcellular atlas of the human proteome. Mol. Cell. Proteomics 2008, 7 (3), 499–508. (27) Alland, L.; Peseckis, S. M.; Atherton, R. E.; Berthiaume, L.; Resh, M. D. Dual myristylation and palmitylation of Src family member p59fyn affects subcellular localization. J. Biol. Chem. 1994, 269 (24), 16701–16705.

(28) Hajnoczky, G.; Csordas, G.; Madesh, M.; Pacher, P. The machinery of local Ca2+ signalling between sarco-endoplasmic reticulum and mitochondria. J. Physiol. 2000, 529 (Pt. 1), 69–81. (29) Montisano, D. F.; Cascarano, J.; Pickett, C. B.; James, T. W. Association between mitochondria and rough endoplasmic reticulum in rat liver. Anat. Rec. 1982, 203 (4), 441–450. (30) Nagasaka, Y.; Dillner, K.; Ebise, H.; Teramoto, R.; Nakagawa, H.; Lilius, L.; Axelman, K.; Forsell, C.; Ito, A.; Winblad, B.; Kimura, T.; Graff, C. A unique gene expression signature discriminates familial Alzheimer’s disease mutation carriers from their wild-type siblings. Proc. Natl. Acad. Sci. U.S.A. 2005, 102 (41), 14854–14859. (31) McDonald, M. J.; Rosbash, M. Microarray analysis and organization of circadian gene expression in Drosophila. Cell 2001, 107 (5), 567– 578. (32) Futcher, B.; Latter, G. I.; Monardo, P.; McLaughlin, C. S.; Garrels, J. I. A sampling of the yeast proteome. Mol. Cell. Biol. 1999, 19 (11), 7357–7368. (33) Ghaemmaghami, S.; Huh, W. K.; Bower, K.; Howson, R. W.; Belle, A.; Dephoure, N.; O’Shea, E. K.; Weissman, J. S. Global analysis of protein expression in yeast. Nature 2003, 425 (6959), 737–741. (34) Greenbaum, D.; Colangelo, C.; Williams, K.; Gerstein, M. Comparing protein abundance and mRNA expression levels on a genomic scale. GenomeBiology 2003, 4 (9), 117. (35) Greenbaum, D.; Jansen, R.; Gerstein, M. Analysis of mRNA expression and protein abundance data: an approach for the comparison of the enrichment of features in the cellular population of proteins and transcripts. Bioinformatics 2002, 18 (4), 585– 596. (36) Yin, L.; Tao, Y.; Zhao, K.; Shao, J.; Li, X.; Liu, G.; Liu, S.; Zhu, L. Proteomic and transcriptomic analysis of rice mature seed-derived callus differentiation. Proteomics 2007, 7 (5), 755–768. (37) Bai, Y.; Shakeley, R. M.; Attardi, G. Tight control of respiration by NADH dehydrogenase ND5 subunit gene expression in mouse mitochondria. Mol. Cell. Biol. 2000, 20 (3), 805–815. (38) Blackler, A. R.; Klammer, A. A.; MacCoss, M. J.; Wu, C. C. Quantitative comparison of proteomic data quality between a 2D and 3D quadrupole ion trap. Anal. Chem. 2006, 78 (4), 1337–1344.

PR800845A

Journal of Proteome Research • Vol. 8, No. 4, 2009 1901