Quantitative Characterizations of Proteome: Dependence on the

We examined the dependence of a numerical characterization of proteome on the number of proteins considered in the analysis. We varied the number of ...
3 downloads 0 Views 187KB Size
Quantitative Characterizations of Proteome: Dependence on the Number of Proteins Considered Milan Randic´ * National Institute of Chemistry, Ljubljana, Slovenia Received December 14, 2005

We consider the sensitivity of numerical characterizations of proteome on the number of proteins considered in the analysis. We examined data on proteomics maps belonging to the liver cells of mice subject to four proliferators. We varied the number of proteins considered for quantitative analysis from 25 up to 1000 proteins. For each case, we have compared the similarity/dissimilarity results when different number of proteins has been considered. We found that proteins maps based on a set of about 300 most abundant proteins spots suffice for satisfactory numerical characterization of corresponding proteome. Keywords: proteome characterization • graph theory of proteome • sensitivity of numerical characterizations • proteome quantitative analysis

Introduction The importance of having quantitative numerical characterization of proteomics maps is self-evident: Such characterizations of proteomics maps make possible various computer manipulations of available visual or tabular raw data, which may offer new insights on cell proteome as a whole. For instance, recently Randic´ and Estrada1 have reexamined available data of Anderson et al.2 on proteomics maps belonging to six different concentrations of a proliferator LY171883 and were able to show that the dose response curve exhibits hormesis, that is, a nonlinear dependence represented by a J shaped curve. Hormesis3,4 has been observed for quite some time for organisms as a whole often being manifested with some beneficial effects at very small doses of toxic substances or harmful procedures. For example, it has been shown that rats exposed to small doses of irradiation have better survival rate than the control group, which was not exposed to such radiation, when both are subject to large doses of irradiation.5 The data of Anderson et al. has been examined previously using statistical methods2 (the Principal Component Analysis), as well as by two different numerical characterizations of proteomics maps,6,7 none of which revealed the presence of hormesis at the proteome level. The discovery of hormesis awaited development of an alternative approach to characterization of proteomics maps, which was focused on proteome (abundance of proteins spots in a 2-D gel) rather then on proteomics maps (which includes information on mass and the charge of proteins). By “characterization” of proteomics maps or proteome we consider representation of a 2-D gel map of proteins of a cell or proteins of a cell, respectively, by a collection of numbers * To whom correspondence should be addressed. Visiting. Permanent Address: 3225 Kingman Rd., Ames, IA 50014. Fax: (515) 292-8629. E-mail: [email protected]. 10.1021/pr050463+ CCC: $33.50

 2006 American Chemical Society

which represent mathematical properties of such systems, typically extracted from mathematical representation of 2-D gel by various matrices of algebra or by other algebraic manipulations of the raw data given by the list of the coordinates of the spots and their abundance.8-20 In this article, we will consider the sensitivity of such characterizations of proteomics data on the number of protein spots included in the analysis. Even though typical proteomics maps may have visible between 1000 and 2000 protein spots most of the past characterizations of such maps by mathematical invariants have been limited to a small fraction of spots, typically about two dozen most intensive spots. One arrives at quantitative characterization of proteomics maps by associating with a map a well-defined geometrical object. As has been elaborated in several publications on numerical characterization of proteomics maps such mathematical constructions of fixed geometry are superposed over selected spots of the proteomics map considered. They included a zigzag curve which connects protein spots in order of their decreasing abundance,8,9 the graph of partial ordering which connects spots that dominate neighboring spots in the mass and the charge,11 the cluster graph which connects all spots within selected critical distance,15 or graphs which connect the selected number of the nearest neighbor protein spots.16,17,19 In all these cases, one selects the number N of protein spots of a map that one wishes to include into the analysis. In previous studies on characterization of proteomics maps N was selected in an ad hoc manner to be usually in the range 20-30, up to 100, assuming that this may suffice for adequate representation of maps. Here, we will scrutinize this assumption and will examine how sensitivity of the characterization of proteomics map depends on the number of spots selected for analysis. We will consider five proteome maps belonging to mice liver cell of mice, which include the control group and mice exposed Journal of Proteome Research 2006, 5, 1575-1579

1575

Published on Web 05/23/2006

research articles

Randic´

Table 1. Input x,y Coordinates for 25 Most Abundant Proteins Spots of 1054 Spots of the Control Group, and the Abundances for the Control Group and Four Proliferators x

y

control

PFOA

PFDA

CLFB

DEHP

A-E

A-E

A

B

C

D

E

2237.0 2804.3 1183.9 2182.2 2685.6 1527.9 1346.0 2868.5 1406.3 2646.6 2450.2 1474.0 2974.9 2068.4 642.2 2860.7 2032.7 2752.7 2334.2 1053.6 2519.5 2552.5 1214.3 2651.1 2327.9

2278.6 903.6 959.6 928.8 1196.1 825.5 1352.5 778.0 1118.1 1288.0 409.2 665.1 772.8 823.1 669.8 1649.9 902.8 765.6 982.2 864.3 1365.9 2409.4 620.0 1149.6 677.3

144 357 143 630 136 653 127 195 118 581 114 929 112 251 108 883 98 224 94 128 93 601 90 004 86 730 84 842 82 492 81 965 80 015 79 847 72 791 72 173 69 452 67 772 64 684 61 074 59 294

108 713 155 565 113 859 99 160 112 790 192 437 58 669 26 105 91 147 100 030 83 172 129 340 70 746 73 814 73 974 16 137 77 314 20 782 76 369 77 982 37 838 75 767 63 511 56 365 55 535

95 028 188 582 150 253 73 071 49 769 221 567 38 915 50 735 82 963 84 933 62 934 112 361 78 691 45 482 74 466 16 501 80 072 13 103 52 749 60 376 16 129 98 740 38 075 54 972 32 155

147 081 159 898 163 645 76 642 109 856 166 080 73 159 45 923 84 196 100 303 79 870 112 655 105 760 71 911 84 703 60 077 76 027 38 816 55 599 46 808 57 167 135 758 58 364 85 399 60 155

165 886 155 055 8111 112 096 138 795 180 590 77 075 116 849 92 942 143 490 109 381 119 402 116 281 97 444 88 545 148 992 100 836 53 830 77 432 78 121 71 274 124 008 75 760 97 303 68 464

Table 2. Non-Zero Entries of the Similarity/Dissimilarity Matrix Based on the Magnitudes of the Vectors of Table 5 for N ) 25 to N ) 1000

SAB SAC SAD SAE SBC SBD SBE SCD SCE SDE

SAB SAC SAD SAE SBC SBD SBE SCD SCE SDE

25

50

100

200

300

400

6887.7 8983.2 5897.1 7834.3 4747.7 5057.9 9318.0 5880.8 11463.3 8635.6

4139.3 3496.2 3827.9 5211.5 5315.9 3537.5 6888.3 3392.2 5184.3 4626.9

2325.7 1865.8 2086.3 2785.8 2971.8 1881.0 3747.2 2028.0 2878.7 2795.2

1242.3 1068.7 1213.9 1469.4 1657.5 1035.5 2037.0 1199.6 1574.7 1602.3

848.7 753.7 967.7 1086.7 1146.8 845.1 1383.8 952.0 1098.2 1084.2

641.9 576.3 737.1 820.5 883.9 640.9 1045.7 742.0 848.5 820.2

500

600

700

800

900

1000

516.5 469.0 596.6 679.0 747.6 515.1 864.9 605.5 720.8 688.7

433.6 397.7 502.5 569.4 627.1 430.8 723.6 512.4 605.0 575.2

373.5 347.5 435.5 489.3 541.6 369.9 624.0 443.7 522.8 495.1

327.5 308.4 384.6 490.3 532.4 324.2 598.2 389.6 517.8 494.3

293.4 284.1 348.5 436.5 474.0 290.6 536.2 347.1 461.1 439.9

264.3 257.5 315.0 393.0 426.9 261.8 483.7 312.6 415.3 396.1

to the following four peroxisome proliferators: PFOA, PFDA, Clofibrate, and DEHP. The proteomics maps considered have been previously used in some of our publications. Details on the experimental procedure have been outlined by Frank Witzmann, whose data we are using.21,22

Proteome versus Proteomics Maps In Table 1, we have listed the input data that we will analyze and try to characterize as a whole, rather than examining the input information individually. The first two columns of Table 1 give the x,y coordinates for 25 most intensive spots of the 1576

Journal of Proteome Research • Vol. 5, No. 7, 2006

Figure 1. Plot of abundance for the protein spots 1-25, 501525, and 1001-1025.

proteomics maps for the control group, while the remaining five columns show the corresponding abundance values for the control group and the four cases of animals exposed to the four peroxisome proliferators. A combination of the first two columns with any of the remaining column represents data on a single proteomics map, while the information on abundance listed in the five remaining columns, if considered separately and without a reference to the first two columns, represents data on liver cell proteome. It is important to maintain the distinction between characterization of proteomics maps and characterization of proteome. There is a distinction in analyzing information on proteomics map and information on proteome. For example, as has been recently pointed out by Randic´ and Estrada,1 the earlier analyses of proteomics maps related to administration of different doses of peroxisome proliferator LY171883 to diet of rats only suggested, or could have suggested, somewhat different perturbations of proteome at small and large concentration of LY171883. However, when the analysis has been limited to proteome, rather then proteomics maps, immediately the dose response curve has shown characteristic J shape, clearly indicating the presence of hormesis. This was the first time that hormesis was detected at the proteome level. In view of this apparent better sensitivity of proteome data, we have decided in this article to focus attention to proteome, rather than proteomics map. We will therefore investigate how sensitive is a characterization of the cell proteome on the number of proteins considered. This means that in our analysis we will not consider the x, y coordinates of protein spots in 2-D gel, but will only consider

research articles

Quantitative Characterizations of Proteome

Figure 2. Protein map depicting N 50, 100, 200, and 400 protein spots.

their abundances. In view of this, we need not consider scaling of data and will therefore use raw abundance data of Table 2 as reported by F. Witzmann. 21

Outline of the Approach Let us use labels A-E for the five columns of Table 1 corresponding to abundance information on the control group, PFOA, PFDA, CLFB (Clofibrate), and DEHP, respectively. In Figure 1 we show graphically the relative abundances for the first 25 most intensive spots, the abundance of spots 501-525, and the spots 1001-1025. As we can see, the oscillations of the relative abundance persist even for a selection of proteins belonging to rather weak spots having the relative order in abundance 1001-1025. However, observe the change in the scale for the three groups of proteins spots, which differ by approximately 1 order of magnitude in their abundances from one case to another. In Figure 1a the most abundant spot approaches intensity of 250 000, in Figure 1b it approaches 30 000 and in Figure 1c the most intensive spot has abundance below 6000. From Figure 1 it is interesting to observe that although overall the variations of the abundance of individual proteins for different proliferators appears chaotic, there are few proteins, like protein position 22 in Figure 1a, protein 6, 12 and protein 18 in Figure 1b and possibly protein 13 in Figure 1c that for all four proliferators show the same trend. Our interest here, however, is not in the response of individual proteins, but rather the collective response of all proteins considered. We will therefore consider for a given number N

of protein spots the total overall change in the protein abundance after introduction of each of the four proliferators in comparison with the overall quantity of proteins in the control group. In the first column of Table 2, we show the entries for the case N ) 25 of the similarity-dissimilarity matrix for the five proteomics maps A-E based on computed value of the square root of the sum of squared differences of abundance between a pair of proteomics maps: SXY ) x {Σ i (ZX - ZY)2} ZXi and ZYi being the abundances of protein i, where X and Y take labels A-E, the summation going from i ) 1 to i ) N. The smallest entry (SBC ) 4747.1) indicates the most similar pair of proteomic maps among those possibilities, and the largest entry (SCE ) 11 463.3) points to the least similar pair of proteomics maps. If we confine attention only to the first four entries of the first column of Table 2 we see the following: SAD < SAB < SAE < SAC, which means that overall of the four peroxisome proliferators Clofibrate makes the smallest perturbation and PFDA the largest perturbation of the cell proteomesat least this follows when we confine attention to the leading 25 proteins of the control cell. Will this hold when we increase the number of proteins considered? To answer the question we will explore how the 10 entries of the first column of Table 2 change when one changes the number of protein spots considered. We will consider besides Journal of Proteome Research • Vol. 5, No. 7, 2006 1577

research articles

Randic´

Figure 3. Correlations of the adjacent columns of Table 2 showing the degree of parallelism between maps of different number of protein spots.

the case N ) 25 the cases N ) 50, N ) 100, N ) 200, N ) 300, and so on until N ) 1000. We will be looking for possible change in the relative values of the 10 quantities SAB - SDE. If we observe no significant change in the relative values of SAB - SDE, then that would indicate that inclusion of additional proteins does not alter the existing characterization of the map, that is, contributions of the remaining protein spots beyond N play no significant role for characterization of the proteomics maps and cell proteome. A single descriptor, just as a single map invariant contains rather limited information and if one wants to make more conclusive statements one would have to use additional descriptors or additional map invariants. However, as we will see, if a single descriptor is well selected, it can provide useful information on the degree of similarity of cell proteome or proteomics maps.

Sensitivity of SXY on the Number of Spots In Figure 2, we illustrate series of proteomics maps of the control group each based on different number of spots: We show cases: N ) 50, N ) 100, N ) 200, and N ) 300 in order to offer visual image of the density of spots for different N. The question that we want to address is as follows: Is there a 1578

Journal of Proteome Research • Vol. 5, No. 7, 2006

number of spots of a proteomics map, or the number of proteins of proteome, which suffices to characterize adequately proteomics maps or proteome, respectively? In other words: Do we need all 1054 listed spots (or their abundances) for useful comparative study of proteomics maps? There are two aspects related to this question: (1) to see if one can base characterization of proteome of fewer data, which would be accompanied by increase in the efficiency of comparative studies when one considers large data set; and (2) to reduce the noise due to increased number of weak signals, because as the number of spots in the list increases, their relative abundance decrease and is accompanied by greater relative experimental error in measurements of the density of spots, which makes analysis less reliable. To get some insight into the sensitivity of the numerical characterizations of proteomics maps on N we calculated the level of perturbation for each of the five proteomics maps matrices associated with N ) 25 to N ) 1000 the level of perturbation of the proteome relative to other cases. In Table 2, we have listed entries of the symmetrical similarity-dissimilarity table for the case N ) 25 to N ) 1000. The smaller entries in the table point to similarity among the proteomics maps, whereas the larger entries point to dissimilar

research articles

Quantitative Characterizations of Proteome

maps. The first element in the first column, SAB, gives the similarity-dissimilarity between the control group and PFOA and the last element in the column, SDE, gives the similaritydissimilarity between Clofibrate and DEHP. The values in each column correspond to the increasing values of N. The similarity-dissimilarity values decrease with N because as the number of spots increase the abundances of protein in the 2-D gel maps decrease. To see if the increase of N is accompanied by an increase in the information content we have to examine the entries of Table 2 more closely. If all of the numbers in successive columns are proportional then N is not a critical parameter for the numerical characterization of the proteome, whereas numbers in adjacent columns show considerable scatter the opposite will be the case. To find out what is the case in Figure 3 we have plotted correlations for adjacent columns of Table 2: 25/50; 50/100; 100/200; 200/300 and so on until 600/700. We stopped with plot of the columns belonging to 600/700 because of repeated correlations which show no significant deviation from a regression line. The plots displayed in Figure 3 are very instructive. As we see with the increase in N the quantities in adjacent columns of Table 2 give better and better regression which steadily approaches simple linear regression. As we can see the plot of the entries belonging to 300/400 we have a satisfactory linear relationship. This means that after N ) 300 we do not see significant change in the relative magnitudes of the corresponding entries in the adjacent columns of Table 2. In other words beyond N ) 300 we do not gain any significant information by adding additional protein spot in the analysis. We conclude that we could stop at N ) 300 and obtain all information of interest for characterization of proteome without further increasing labor by considering larger set of protein spots. The cases 50/100, 100/200, and 200/300 show improved correlations over the case 25/50, which clearly appears not to be sufficiently representative of the proteome as a whole. The continued strong correlations between adjacent columns of Table 3 beyond the case 300/400 support the main conclusion that no significant additional information for a proteome as a whole is obtained by increasing the number of spots in a map beyond 300. This, of course, does not mean that individual variations for proteins of low abundance, some of which we have illustrated in Figure 1b,c, are not important. They appear not to be important for the characterization of the overall change in cell proteome. In conclusion, we may say that earlier studies on characterization of proteomics maps confined to significantly fewer than 300 protein spots can be viewed as describing novel computational approaches to characterization of proteome and proteomics maps, but if one needs more reliable results one is expected to include in such approaches some 300 more intensive protein spots, rather than 30, to make sure that there is no reversal in the relative degree of similarity among different proteomics maps. Similar conclusion was also obtained for characterization of proteomics maps, 22 which differ from the present study in including also the x,y coordinate of protein spots, which are meaningless when one considers proteome, rather than proteomics map.

Acknowledgment. The author wishes to thank the reviewer for numerous suggestions on the manuscript which have led to an improved representation of the material. References (1) Randic´, M.; Estrada, E. Order From Chaos: Observing Hormesis At The Proteome Level, J. Protome Res. 2005, 4, 2133-2136. (2) Anderson, N. L.; Esquer-Blasco, R.; Richardson, F.; Foxworthy, P.; Eacho, P. The Effect Of Peroxisome Proliferators On Protein Abundances In Mouse Liver. Toxicol. Appl. Pharmacol. 1996, 137, 75-89. (3) Calabrese, E. J.; Baldwin, L. A. Hormesis: The dose-response revolution. Annu. Rev. Pharmacol. Toxicol. 2003, 43, 175-197. (4) Calabrese, E. J.; Baldwin, L. A. Toxicology rethinks its central belief. Hormesis demands a reappraisal of the way risks are assessed. Nature 2003, 421, 691-692. (5) Randic´, M.; Supek, Z.; Influence of high doses of x-radiation on 5-hydroxytryptamine in the brain of rats. Int. J. Radiat. Biol. 1961, 4, 637-638. (6) Randic´, M.; Vracˇko, M.; Novicˇ, M.; On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants, J. Proteome Res. 2002, 1, 217-226. (7) Vracˇko, M.; Basak, S. C.; Similarity Study of Proteomic maps. Chemometr. Intell. Lab. Syst. 2004, 70, 33-38. (8) Randic´, M. On graphical and numerical characterization of proteomics maps. J. Chem. Inf. Comput. Sci. 2001, 41, 1330-1338. (9) Randic´, M.; Zupan, J.; Novicˇ, M. On 3-D graphical representation of proteomics maps and their numerical characterization. J. Chem. Inf. Comput. Sci. 2001, 41, 1339-1334. (10) Randic´, M.; Witzmann, F.; Vracˇko, M.; Basak, S. C. On characterization of proteomics maps and chemically induced changes in proteomics using matrix invariants: Application to peroxisome proliferators. Med. Chem. Res. 2001, 10, 456-479. (11) Randic´, M. A graph theoretical characterization of proteomics maps. Int. J. Quantum Chem. 2002, 90, 848-858. (12) Randic´, M.; Basak, S. C. A comparative study of proteomics maps using graph theoretical biodescriptors. J. Chem. Inf. Comput. Sci. 2002, 42, 983-992. (13) Randic´, M.; Zupan, J.; Novicˇ, M.; Gute, B. D.; Basak, S. C. Novel matrix invariants for characterization of changes of proteomics maps, SAR QSAR Environ. Res. 2002, 13, 689-703. (14) Randic´, M. Quantitative characterization of Proteomics Maps by Matrix Invariants, In: Handbook of Proteomics Methods; Michael Conn, P., Ed.; Humana Press: Totowa, NJ, 2003, pp 429-450. (15) Bajzer, Zˇ .; Randic´, M.; Plavsˇic´, D.; Basak, S. C.; Novel Matrix Invariants for Characterization of Toxic Effects on Proteomics Maps. J. Mol. Graph. Model. 2003, 22, 1-9. (16) Randic´, M.; Lersˇ, N.; Plavsˇic´, D.; Basak, S. C. Characterization of 2-D Proteome maps based on the nearest neighborhoods of spots, Croat. Chem. Acta 2004, 77, 345-351. (17) Randic´, M.; Lersˇ, N.; Plavsˇic´, D.; Basak, S. C. On Invariants of a 2-D Proteome Map Derived from Neighborhood Graphs J. Proteome Res. 2004, 3, 778-785. (18) Bajzer, Zˇ .; Basak, S. C.; Vracˇko, M.; Grobelsˇek, M.; Randic´, M. Use of Proteomics Based Biodescriptors in the Characterization of Chemical Toxicity, In: Genomic and Proteomic Applications in Toxicity Testing; Cunningham, M. J., Ed.; Humana Press: Totawa, NJ, 2005. (19) Randic´, M.; Lersˇ, N.; Vukicˇevic, D.; Plavsˇic´, D.; Gute, B. D.; Basak, S. C. Canonical Labeling for Protein Spots and Proteomics Maps. J. Proteome Res. 2005, 4, 1347-1352. (20) Randic´, M.; Novicˇ, M.; Vracˇko, M.; Novel Characterization of Proteomics Maps by Sequential Neighborhoods of Protein Spots, J. Chem. Inf. Comput. Sci. 2005, 45, 1205-1213. (21) Witzmann, F. Molecular Anatomy Laboratory, Department of Biology, Indiana University & Purdue University, Columbus, IN. (22) Randic´, M.; Witzmann, F. A.; Kodali, V.; Basak, S. C. On the Dependence of a Characterization of Proteomics Maps on the Number of Protein Spots Considered, J. Chem. Inf. Model 2006, 46, 116-122.

PR050463+

Journal of Proteome Research • Vol. 5, No. 7, 2006 1579