Complex Graph Matrix Representations and Characterizations of Proteomic Maps and Chemically Induced Changes to Proteomes Krishnan Balasubramanian,*,†,‡ Kanan Khokhani,‡ and Subhash C. Basak§ Chemistry and Material Science Directorate, Lawrence Livermore National Laboratory, University of California, Livermore, California 94550, Glenn T. Seaborg Center, Lawrence Berkeley Laboratory, University of California, Berkeley, California 94720, Department of Mathematics and Computer Science, California State University, East Bay, Hayward, California 94542, and Natural Resources Research Institute, University of Minnesota at Duluth, 5013 Miller Trunk Highway, Duluth, Minnesota 55811 Received December 8, 2005
We have presented a complex graph matrix representation to characterize proteomics maps obtained from 2D-gel electrophoresis. In this method, each bubble in a 2D-gel proteomics map is represented by a complex number with components which are charge and mass. Then, a graph with complex weights is constructed by connecting the vertices in the relative order of abundance. This yields adjacency matrices and distance matrices of the proteomics graph with complex weights. We have computed the spectra, eigenvectors, and other properties of complex graphs and the Euclidian/graph distance obtained from the complex graphs. The leading eigenvalues and eigenvectors and, likewise, the smallest eigenvalues and eigenvectors, and the entire graph spectral patterns of the complex matrices derived from them yield novel weighted biodescriptors that characterize proteomics maps with information of charge and masses of proteins. We have also applied these eigenvector and eigenvalue maps to contrast the normal cells and cells exposed to four peroxisome proliferators, namely, clofibrate, diethylhexyl phthalate (DEHP), perfluorodecanoic acid (PFDA), and perfluoroctanoic acid (PFOA). Our complex eigenspectra show that the proteomic response induced by DEHP differs from the corresponding responses of other three chemicals consistent with their chemical structures and properties. Keywords: 2D-gel pattern • proteome characterization • graph theory of proteome • chemically induced response • complex matrices
1. Introduction The evaluation of drugs and toxicants for their effects on the cellular proteome is central to many fields such as molecular pharmacology, drug discovery, and hazard assessment. Therefore, significant efforts have been devoted to the development of mathematical and computational techniques for mathematical chemistry characterizing proteomes, DNA, and their responses to chemicals.1-28 Proteomic maps contain information on the variations of the relative abundance, induction, and repression of thousands of proteins present in a cell and can serve as powerful tools to measure biochemical changes induced upon the cell by toxicants, drugs, and so on. In a typical experimental setup, cellular material (as homogeneous as possible, i.e., selecting cells from the same organs of experimental animals) is subjected to a combined electrophoretic and chromatographic analysis which results in a two* To whom correspondence should be addressed. E-mail,
[email protected]; phone, 925-422-4984. † Chemistry and Material Science Directorate, Lawrence Livermore National Laboratory, University of California and Glenn T. Seaborg Center, Lawrence Berkeley Laboratory, University of California. ‡ California State University. § University of Minnesota at Duluth. 10.1021/pr050445s CCC: $33.50
2006 American Chemical Society
dimensional proteomics gel (2D-gel) in which thousands of proteins are separated.6,7 The experimental data consist of a list of the locations of the proteins (as x and y coordinates) and their abundance. The abundances are given by densities of the experimental spots in a gel, as has been described in the literature.9 When an animal is exposed to chemicals, the patterns of protein expression in affected cells can change appreciably. The changes may be due to the effects of exposure to chemicals or abnormalities and departure of the cell from the normal state caused by alterations in cellular transcriptional and translational processes, as well as through post-translational modifications of individual proteins.8 To compare proteomics maps, one needs a numerical quantification of their protein patterns maps that leads to a condensed representation of the available data offering a characterization based on a relatively small and manageable number of descriptors. Specific biodescriptors advanced thus far include (a) invariants of graphs or matrices associated with proteomics maps;2 (b) information-theoretic biodescriptors;11 (c) spectrum-like descriptors of proteomics patterns;12 and (d) critical protein biomarkers derived using statistical methods.29 Graph theory has been successfully applied to a number of problems in genomics11-16 and proteomics.1-3 For example, Journal of Proteome Research 2006, 5, 1133-1142
1133
Published on Web 03/28/2006
research articles
Balasubramanian et al.
Figure 1. “Bubble” diagram illustrating the location and abundance of individual proteins for the rat liver control in the experimental 2D gel.
Randic´ and co-workers2 have considered powers of matrices derived from associated graphs, called the D/D matrix approach, which is based on the graph distances and Euclidian distances between vertices which represent the proteins of the proteomics maps. The vertices are connected in the relative order of abundance to generate a graph, which then yields the various matrices. While the approach is quite interesting and the first of its kind, there is room for further development as noted by Randic´ et al.2 For example, the D/D matrix approach does not weight the vertices with the masses and charges of proteins, and thus, some intrinsic information pertinent to proteins may not be fully considered in this algorithm. In the present work, we have considered a new approach for the quantification of not only proteomic maps of the cell but also the chemical changes induced upon the cell by various peroxisome proliferators, namely, clofibrate, diethylhexyl phthalate (DEHP), perfluorodecanoic acid (PFDA), and perfluoroctanoic acid (PFOA). A typical proteomics map is shown in Figure 1, where we have represented each protein component as a bubble. The x and y axes represent the charge and mass of proteins, respectively. The data are from rat liver cells by Witzmann18 and co-workers of Indiana University and Purdue University. Our present approach is to consider the mass and charge on each protein directly in addition to the relative abundances of the proteins. We have accomplished this by 1134
Journal of Proteome Research • Vol. 5, No. 5, 2006
cross-fertilization of graph theory and complex algebra by weighting each vertex of the proteomics map by a complex number that uses the mass and charge of the protein as components. Consequently, the complex-weighted graph constructed considers the actual mass and charge of each protein and the relative abundances. In addition, chemically induced changes to the cell are easily represented by the complexweighted graph procedure. With the complex graph, we obtain the eigenvalues or spectra, eigenvectors, and so on, which are plotted on a two-dimensional grid to characterize the proteomics map. We have shown that the spectral map differs substantially for the peroxisome proliferators that we have considered here, namely, clofibrate, DEHP, PFDA, and PFOA whose chemical structures are shown in Figure 2.
2. Computational Methods and Proteomics Algorithms Based on Complex Matrices Table 1 shows a typical 2D-gel pattern of proteins obtained from a cell and chemical changes induced to the cell by peroxisome proliferators, namely, clofibrate, DEHP, PFDA, and PFOA. In Table 1, we have listed principal protein components with charge and mass values of proteins from rat liver cells. The control represents the relative abundance of the proteins in the natural cell, while PFOA, PFDA, clofibrate, and DEHP data represent the chemical changes induced by these peroxi-
research articles
Complex Graph Matrix Representations
Figure 2. Chemical structures of the four peroxisome proliferators tested on rat liver cells. Gray spheres are halogens, white spheres are hydrogens, cyan spheres are carbons, and red spheres are oxygens. Table 1. The X and Y Coordinates and the Abundance for the Control Rat Liver Cells and When the Animal Is Exposed to the Four Chemicals Shown in Figure 2 no.
22 52 20 62 48 9 36 2 44 15 5 45 1 56 35 19 47 14 26 41 39 24 29 33 18 12 40 30 57
x
y
1183.9 959.6 2182.2 928.8 1527.9 825.5 1346 1352.5 1406.3 1118.1 1474 665.1 2068.4 823.1 642.2 669.8 2032.7 902.8 1053.6 864.3 1214.3 620 2094.5 680.5 1021.7 390.2 2070.4 929.6 1375.7 992.3 1623.4 640.8 1842.5 885.9 1189.5 614.7 1465.5 821.1 1323.4 993 1278.8 981.6 1433.5 662.3 1170 862.2 1139.2 958.4 1167.3 611.7 1202.3 495.5 1030.2 863.2 1122.7 863 1894.5 903.1
control
PFOA
136653 127195 114929 112251 98224 90004 84842 82492 80015 72173 64684 58977 58001 55402 49027 48976 48145 42773 40923 36433 35896 31194 30510 29296 26155 25389 24006 22344 20142
113859 99160 192437 58669 91147 129340 73814 73974 77314 77982 63511 142865 56547 46146 42506 81452 40390 49044 60359 30640 31707 42226 29742 30067 25182 22811 28597 31904 14044
PFDA
clofibrate
DEHP
150253 163645 8111 73071 76642 112.096 221567 166080 180590 38915 73159 77075 82963 84196 92942 112361 112655 119402 45482 71911 97444 74466 84703 88545 80072 76027 100836 60376 46808 78121 38075 58364 75760 46225 48625 146609 53473 60224 71654 33152 59438 69031 52137 46058 69214 133705 64580 65976 24149 47585 52350 77144 46005 59322 94014 79981 3838 31611 33764 44692 21801 29026 32956 41489 42432 69142 30786 26460 37812 39531 39204 27565 41604 21039 23170 17341 20416 30852 36744 50236 46151 18559 17418 19410 13687 16071 17075
some proliferators, respectively. A bubble map thus generated on the (x,y) grid for the natural cell is shown in Figure 1. Since we are considering a new mathematical approach that combines the principals of graph theory with complex variables, we first introduce the basic concepts of graphs as pertinent to proteomics. A graph is simply a collection of
vertices connected by edges. One can envisage the various proteins in the bubble graph in Figure 1 as the vertices of a graph. The question that naturally arises is then how could one introduce edges or bonds between the vertices. In accordance with Randic´ et al.,2 the edges can be introduced by connecting the vertices in the order of relative abundance. Such a graph is shown in Figure 3. Once we have a graph for the proteomics map, as seen from Figure 3, we can use graph theoretical concepts and algorithms to characterize the proteomics maps. Moreover, as we show here, one can invoke complex algebra and arithmetic to characterize the proteomics maps and chemically induced changes to the liver. We thus introduce basic definitions and preliminaries for the algorithms considered here. As seen from Table 1, we are considering data for N ) 20 proteins which are principal components in terms of relative abundance of the rat liver cell under consideration. The data contains charge, mass, and natural abundance for the N proteins. First, we have normalized the data provided in decreasing order of abundance. By normalization, it is meant that the highest charge and mass are set to unities, and all other proteins’ charges and masses are scaled relative to the maximal value of 1.0. We have developed a computer code in the language R that reads this normalized data from DATA.txt, which is a tab-delimited file. The format of data contained in DATA.txt is shown in Table 2. The first column is the SID of the protein. The second, third, and fourth columns are charge, mass, and abundance, respectively. Another input to the program that we have developed is read-in from GRAPH.txt. This file is a tab-delimited file, containing the neighborhood information of N proteins as shown in Table 3. The first column shows the vertex number. The second column shows the number of neighbors having labels less than the vertex. The third column shows the label of the vertex, which is adjacent to vertex in column one. Table 3 shows representation of N ) 20 proteins. Consider these proteins to be a graph of N proteins in a chain. We can represent this graph as an adjacency matrix, say Adj, which will have a 1 as Adj[i,i+1]and Adj[i+1,i] elements; and 0 as all other elements. Adj[i,i+1] ) Adj[i+1,i] ) 1 Adj[i,j] ) 0
otherwise
Once we have defined a graph-theoretical representation of the proteome, the question is how to seek an invariant that truly characterizes the underlying pattern in the proteomics map without loosing the vital physical characteristics of the proteins such as charge and mass. A structural invariant is a mathematical function or a quantity that does not depend on labeling of the structure, its orientation, or the labels of vertices, and to the best possible extent, it characterizes the object such as DNA sequences or proteomics maps uniquely. While the uniqueness may not always be accomplished, the invariance to labels and representations can be achieved. These invariants would serve as descriptors and also as pattern recognition tools for the proteomics maps. While generation of such invariants may lead to characterization of the pattern, there could also be some loss of information since the original pattern may have more information than that which can be characterized by a few numbers of functions. We endeavor to formulate more than one such structural invariant so that as much information contained in the proteomics maps can be characterized as Journal of Proteome Research • Vol. 5, No. 5, 2006 1135
research articles
Balasubramanian et al.
Figure 3. Zigzag graph obtained from the proteomic map in Figure 1 by considering 20 principal components of proteins with large abundance and connecting the vertices in the order of relative abundance. Table 2. Normalized Relative Abundance of Various Proteins with Their Masses and Charges SID
charge
mass
abundance
vertex
no. of neighbors
neighbor
187 77 22 52 134 20 62 67 48 96 9 75 36 2 250 44 84 80 15 176
0.71 0.94 0.40 0.73 0.90 0.51 0.45 0.96 0.47 0.82 0.50 1.00 0.70 0.22 0.96 0.68 0.93 0.78 0.35 0.85
1.00 0.40 0.42 0.41 0.52 0.36 0.59 0.34 0.49 0.18 0.29 0.34 0.36 0.29 0.72 0.40 0.34 0.43 0.38 0.60
1 0.995 0.947 0.881 0.821 0.796 0.778 0.754 0.680 0.648 0.623 0.601 0.588 0.571 0.568 0.554 0.553 0.504 0.500 0.481
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
possible. Thus, we propose here a number of structural descriptors, which have not only vectorial and complex features but also scalar features, thus, capturing multidimensional features of the proteome. The characteristic polynomial is obtained using the algorithm POLY described in the papers by Balasubramanian.24-26 The original POLY was in Fortran; an R version of POLY char.poly() was coded. The function char.poly() expects the lower triangle of adjacency matrix. The lower triangle for N × N matrix is a single dimensional array (vector in R terminology) with N(N + 1)/2 elements. Since the adjacency matrix is symmetrical, we only need the lower triangle. A function read.graph() reads information from GRAPH.txt and builds the lower triangle of the adjacency matrix as a single dimensional vector. The eigenvalues are obtained using the eigen() function from the R libraries. However, the input parameter x to the eigen() is 1136
Table 3. Neighborhood Information of the Proteomics Pattern
Journal of Proteome Research • Vol. 5, No. 5, 2006
the entire adjacency matrix. A function full.adj.matrix() was coded to convert the lower triangle of adjacency matrix to full adjacency matrix. Another parameter to eigen() is symmetric, this parameter is set to TRUE, if the input matrix is symmetric, and FALSE otherwise. In the code, this parameter is set by default to FALSE since the code may have nonsymmetric matrices too. A new matrix, say Adj1, is obtained by changing the diagonal elements of Adj to (charge) + i(mass). Adj1[i,i+1] ) Adj1[i+1,i] ) 1 Adj1[i,i] ) charge + i(mass) Adj[i,j] ) 0
elsewhere
The read.graph() function with input parameter comp ) TRUE (comp for complex) is used to build Adj1. The charac-
research articles
Complex Graph Matrix Representations
teristic polynomial of a graph is defined as the secular determinant polynomial of the adjacency matrix. The computation of the determinant is an n! order problem and thus becomes intractable for large graphs. The technique is based on computing powers of the adjacency matrix of the graph and finding the traces of the matrices. One of the authors24-26 has developed a powerful code and algorithm for the characteristic polynomials of graphs. Since the adjacency matrices of the graphs generated from proteomics maps as defined above are complex, we have to generalize these algorithms and techniques for graphs with complex weights. We have done this in the current work by expressing all complex arithmetic operations in terms of real functions. The characteristic polynomial is thus obtained using the char.poly() function mentioned above. Since R can recognize complex and real numbers, the same version of char.poly() works right for both real as well as complex number matrices. The eigenvalues are obtained using the eigen() function from the R libraries. Yet another parameter to eigen() is a logical parameter called “only.values”; this parameter should be TRUE if we need only eigenvalues. Since we are also seeking eigenvectors, by default, this parameter is set to a logical value of FALSE. The spectral decomposition of the input matrix x is returned as a list, for example, e1. There are two components of the list e1; e1$values represent the eigenvalues, and e1$vectors is a complex matrix of N × N order, whose columns represent the eigenvectors. A function get.min.max() was coded to obtain an index of minimum and maximum eigenvalues from e1$values. When these indices are used, the eigenvectors for smallest and the largest eigenvalues are sought. The eigenvector of the largest eigenvalue is called the principal eigenvector, which provides information on the participation of various components in a complex plane. We have plotted the principal eigenvector to provide insight into the abundance and chemically induced changes as a function of the chemical. All of this information is obtained from e1$vectors. Eigenvalues obtained from Adj1 are complex in nature, and they are plotted along the X and Y axes. The eigenvectors corresponding to the largest and smallest eigenvalues are also plotted in a complex grid. The D/D/matrix approach of Randic´ et al. consists of two parts, one called graph distance and the other called Euclidian distance. These two distances measure shortest geometrical and topological (connectivity) distances, respectively. We have defined the diagonal elements of E as relative abundance[i] and off-diagonal elements E[i,j] as Euclidian distance between proteins i and j. E[i,j] ) x(xi - xj)2 + (yi - yj)2 abundance[i]
if i * j
if i ) j
An important normalized graph signature is called the D/D matrix that has off-diagonal elements set to the ratio of Euclidian and topological distances on the weighted graph where graph distances are used in combination with Euclidian distances as shown by Randic´ et al.19,20 There are physical interpretations also for the mathematical invariants. For example, it has been suggested that the principal eigenvalue of D/D matrices measures the degree of foldness of structures.19,20 Several powers of the above D/D matrix generate higher-order invariants, and their leading eigenvalues (λ1k) were considered earlier as descriptors. In this work, we have presented a complementary approach that involved complex matrices, their eigenvalues, and eigenvectors.
As indicated in the Introduction, we have considered here proteomics maps obtained for protein patterns derived for normal liver cells and liver cells extricated from rats that were exposed to four different peroxisome proliferators, namely, perfluoroctanoic acid (PFOA), perfluorodecanoic acid (PFDA), clofibrate, and diethylhexyl phthalate (DEHP). All experimental data that we consider here were obtained by Witzmann and co-workers in the Molecular Anatomy Laboratory of the Department of Biology, Indiana University and Purdue University, Columbus, IN.18 Each of these chemicals induces changes to the proteome, which should then be reflected in the complex algebraic and graph-theoretical generators that we have obtained. Likewise, the sequence of the leading eigenvalues of kD/kD matrices can also provide insight into the proteome and the action of various chemicals. They can be viewed as “biodescriptors” that characterize the state of cellular proteomes, and in general as descriptors that characterize biological systems under various external or internal perturbations. The experimental techniques for the extraction of the 2D-gel patterns have been described in a previous paper adequately.2 Here, we would like to briefly summarize that the 2D-gel maps are obtained from male Fisher-344 rats (225-250 g) from Charles River Breeding Labs. PFDA and PFOA were dissolved in propylene glycol and water, 1:1 by volume, and concentration-adjusted so that the dose volume did not exceed 0.5 mL. Rats were injected intraperitoneally with the above solutions with exposures of 2 mg (n ) 5), 20 mg (n ) 5), and 50 mg PFDA/kg body weight (n ) 9), by single injection, animals sacrificed on day 8 of exposure; 50 mg PFDA/kg body weight (n ) 5), by single injection, animals sacrificed 30 days after exposure; and 150 mg PFOA/kg body weight (n ) 8), by single injection, animals sacrificed on day 3 of exposure. Clofibrate (ethyl-p-chlorophenoxyisobutyrate) was administered as neat oil, 250 mg clofibrate/kg body weight, single intraperitoneal injection on each of 3 successive days, animals sacrificed on day 5 of exposure (n ) 10). DEHP was administered as neat oil, via oral gavage, 1200 mg/kg, animals sacrificed on day 5 of exposure (n ) 3). Matched control rats were vehicleinjected and pair-fed (PFC; n ) 10), while one group (Ad Lib; n ) 6) served as free-eating controls. The 2D electrophoretic technique was employed to get a proteomics map, since the technique has the ability to resolve thousands of cellular proteins based first on their content of acidic and basic amino acids (isoelectric focusing) and second by molecular weight (SDS electrophoresis). The effects of various chemicals were also reflected on the proteomic patterns. In the present study, we are working with these 2D-gel data to mathematically characterize the proteomics patterns and the effects of various chemicals on the proteome. The 2D-gel data contain measurements for charge and mass which we represent in a complex plane, where the real part is the charge and the imaginary part is the mass, respectively. The experimental data in absolute terms contain the x and y coordinates in the range (0 < x < 3000 and 0 < y < 2500), while the abundance is measured in units yielding entries several orders of magnitudes larger. For the control data, which measure the abundance of proteins without any chemicals exposed, the control is in the range (0, 137 000), but in the presence of toxic substances, control can increase even above 200 000 (for protein no. 20 of F344 liver PFDA). We have thus renormalized the experimental entries data by setting the largest positive value to unity and scaling all numbers relative to that. This also keeps the matrices, eigenvectors, and eigenJournal of Proteome Research • Vol. 5, No. 5, 2006 1137
research articles
Balasubramanian et al.
Table 4. Results from Adjacency Matrix real characteristic polynomial
1, 0, -19, 0, 153, 0, -680, 0, 1820, 0, -3003, 0, 3003, 0, -1716, 0, 495, 0, -55, 0, 1
real eigenvalues
-1.9777, 1.9777, -1.9111, 1.9111, 1.8019, -1.8019, -1.6525, 1.6525, 1.4661, -1.4661, -1.247, 1.247, -1, 1, 0.7307, -0.7307, 0.445, -0.445, -0.1495, 0.1495 Table 5. Results from Adjacency Matrix with Diagonal Elements as Charge + i(Mass) complex characteristic polynomial
1 + 0i, -8.86 - 13.86i, -72.7393 + 116.5663i, 777.6914 + 149.3978i, -464.6874 - 3456.4613i, -11252.2716 + 5065.7978i, 22906.4781 + 27992.7936i, 53698.4071 - 71512.0605i, -172125.0388 - 77250.3828i, -73236.1264 +333913.1848i, 533380.8604 + 13071.4797i, -109820.7759 - 708606.7969i, -785103.1309 + 260111.145i, 371819.4414 + 723219.8399i, 549065.606 391400.2502i, -317515.0114 - 338270.7795i, -165004.3099 + 199079.6725i, 94325.2473 + 61297.0751i, 16261.2879 32060.8262i, -7015.8374 - 2733.3932i, -217.5977 + 747.4312i complex eigenvalues
2.3705 + 0.6697i, 2.3344 + 0.7247i, 2.1922 + 0.6997i, 2.0526 + 0.6762i, 1.9004 + 0.6774i,1.7166 + 0.7635i, -1.5898 + 0.69i, -1.4383 + 0.683i, 1.4541 + 0.6431i, -1.3874 + 0.7217i, -1.2018 + 0.6627i, 1.1567 + 0.684i, 0.9577 + 0.7404i, -1.0129 + 0.614i, -0.7772 + 0.7042i, 0.6562 + 0.6222i, -0.5551 + 0.7053i, 0.3033 + 0.7229i, -0.0019 + 0.7539i, -0.2701 + 0.7014i
values within numerical bounds and without subjecting them to numerical overflows. The overflows can become problematic particularly for higher powers of the D/D matrix.
3. Results and Discussion The characteristic polynomial and eigenvalues from the matrix Adj1, which is the adjacency matrix of the ordinary unweighted proteomics map, are shown in Table 4. The characteristic polynomial and eigenvectors corresponding to the smallest and largest eigenvalues from the complex matrix, which have diagonal elements weighted with complex weights corresponding to charge and mass, are shown in Table 5. As expected, in contrast to the results in Table 4, which do not consider the charge and mass of each protein, the results in Table 5 all have complex eigenvalues, wherein the real component can be thought of as the mass component, while the imaginary component corresponds to the charge. Clearly, the results in Tables 4 and 5 are substantially different indicating how the charge and mass of the proteomics map play a critical role in determining the eigenvalues and eigenvectors, which are mathematical descriptors of the proteome. As can be seen from Table 5, the eigenvalues exhibit a larger spread along the charge variable with this component varying from 2.3705 (largest) to 0.0019 (smallest in magnitude); the spread along the imaginary component, which corresponds to the mass, is between 0.6 and 0.8. This shows that there is a much smaller variation in the spectra along the mass axis and a larger spread along the charge axis. The eigenspectra shown in Table 5 correspond only to control of the relative abundances in the absence of any other external chemicals. Thus, the results in Table 5 can be viewed as a descriptor of the proteome itself. Figure 4 shows a graphical representation of the complex spectral map of the proteomics pattern. As seen from Figure 4, which gives more insight, the spread along the y-axis (mass) is much less compared to the spread along the x-axis (charge). There are n possible eigenvectors that are orthogonal to each other for each of the eigenvalues, where n is the number of proteins. For the sample data set, since we have considered the 20 most abundant proteins, we have 20 eigenvectors. 1138
Journal of Proteome Research • Vol. 5, No. 5, 2006
Figure 4. Plot of complex eigenvalues. The real and imaginary part plotted on the X- and Y-axis, respectively. Square, smallest principal value; triangle, largest principal value.
Among these, the eigenvector corresponding to the largest eigenvalue in norm, called the principal component vector, is an important structural descriptor. This vector is plotted in Figure 5 along the charge and mass axes, respectively. We expect this pattern of principal eigenvectors to be a unique descriptor for a given proteome and hence a very useful descriptor of the proteome. As discussed in Section 2, we have also considered the Euclidian distance matrix of the proteins. Table 6 shows the characteristic polynomials, principal eigenvalues, and eigenvectors, as well as smallest eigenvalue and eigenvector of the Euclidian matrix. Note that the Euclidain matrix measures the shortest geometrical distance between the proteins on the (x,y)-grid. The Euclidian spectra are not too interesting by themselves, as the principal eigenvalue stands
research articles
Complex Graph Matrix Representations Table 6. Characteristic Polynomials and Eigenvalues of the Euclidian Matrix characteristic polynomial from Euclidian matrix
1, -13.843, 57.43, -85.756, -74.542, 530.51, -1018.199, 1061.374, -545.815, -114.473, 464.774, -437.719, 251.493, -98.453,26.24, -4.31,0.233, 0.06, -0.086, -0.453, -9.521 eigenvalues from Euclidian matrix
8.106, -2.203, 0.886, -0.833, 0.824, 0.781, 0.689, 0.686, 0.65, 0.644, 0.533, 0.51, 0.489, 0.45, 0.435, 0.407, 0.356, 0.3, 0.285, -0.151 eigenvector of smallest principal value
0.015, 0.24, -0.275, 0.062, 0.218, -0.195, -0.212, 0.267, -0.23, 0.117,-0.212, 0.296, 0.027, -0.396, 0.224, 0.005, 0.268, 0.133, -0.357, 0.191 eigenvector of largest principal value
0.361, 0.212, 0.226, 0.171, 0.202, 0.195, 0.226, 0.218, 0.2, 0.232, 0.207, 0.229, 0.169, 0.305, 0.26, 0.165, 0.203, 0.166, 0.233, 0.197
out as a large number and the remaining values are small. This is quite typical of purely distance-based measures as shown by one of the authors27,28 in the context of distance spectra, Euclidian distances28 and distance polynomials.27 It can be easily shown2 that for an unweighted graph the principal eigenvalue, λ1, of the D/D matrix asymptotically reaches the value 2 cos[π/(n + 2)], the leading eigenvalue of adjacency matrix of a chain of length n. The most interesting trends are obtained by plotting the eigenvalues and the principal eigenvectors on the same chargemass grid of the proteomics map. We shall see that such plots characterize not only the proteomics map but also the chemically induced changes to the proteome by the various peroxisome proliferators that we have considered here, namely, clofibrate, DEHP, PFDA, and PFOA. We discuss these eigenvalues and eigenvector maps and show that they are novel descriptors of the proteome and their responses to chemicals or toxicants. Figure 6 shows the complex spectral map and complex principal eigenvector corresponding to the data obtained from
Figure 5. Plot of the eigenvector corresponding to largest principal eigenvalue. The real and imaginary part plotted on the X- and Y-axis, respectively.
rat liver cell exposed to clofibrate. The real and imaginary parts are plotted on the x- and y-axis, and they represent charge and mass, components in the original matrix, respectively. The corresponding plots for DEHP are in Figure 7, FFDA on Figure 8, and PFOA on Figure 9, respectively. A uniform feature of all complex eigenspectral maps is that the spread is larger along the charge axis compared to the mass axis. Note that for each of the plots, the original matrices measure the perturbation caused by the chemicals, as diagonal elements are the differences between the data obtained after exposure to the chemical subtracted from the data of the control. Thus, the spectral maps are true reflections of the effects of chemicals on the rat cell. The most interesting information is obtained by considering the distance and vectorial positions of the smallest and largest eigenvalues of the eigenspectra of the four chemicals that we have considered here. As can be seen from Figures 6-9, DEHP stands out in exhibiting the largest spread or Euclidian distance between its smallest and principal eigenvalues (distance between the triangle and square in figure). As can be seen from these figures, the vectorial relative positions of the smallest and largest eigenvalues on the complex grid also differ for DEHP compared to the other three chemicals. Whereas DEHP shows substantial vertical displacement between the smallest and largest eigenvalues, this is not the case for the other three chemicals (see, Figures 6-9). The information-theoretic analyses by Basak et al.11 of the proteomics patterns of PFOA, PFDA, clofibrate, and DEHP using 10, 200, 500, and 1054 spots show DEHP to be substantially different from the other three peroxisome proliferators. It is interesting to note that when one wants to use a large number of spots, for example, >1000 spots to characterize a small number of maps, such as four peroxisome proliferators, the number of independent variables (spots) are overwhelming. To solve this, Basak and co-workers30 have attempted various approaches to develop a small number of compact descriptors. Leading eigenvalues of the D/D matrix formulated by Randic´ et al.2 and the spectrum-like descriptors developed by Vracko et al.3 are examples of compact descriptors which condense information present in the map using a few numerical parameters. As more data on the effects of more numerous chemicals on cellular/biological systems are available, the utility of such descriptors can be tested. The complex norm of the principal eigenvalues corresponding to four chemicals would measure the deviation from the unperturbed proteomics map. Since we have subtracted the Journal of Proteome Research • Vol. 5, No. 5, 2006 1139
research articles
Balasubramanian et al.
Figure 6. Complex spectral map and complex principal eigenvector of clofibrate on the rat liver cell. The real and imaginary parts are plotted on the X- and Y-axis, respectively. Square, smallest principal value; triangle, largest principal value.
control from the diagonal elements of the perturbation matrix, if there were to be no perturbation, we would have a zero eigenvalue, and thus, the deviation from the zero value measures the perturbation by the chemical to the proteome. That is to say, the principal eigenvalue with the greatest norm causes the largest perturbation, while the one with the smallest norm causes the least perturbation. One may recall that the norm of a complex variable is the square root of the sum of the squares of real and imaginary parts. On the basis of this, we find that the four chemicals considered here have the norms 2.50, 2.98, 3.03, and 3.17 for PFOA, clofibrate, PFDA, and DEHP, respectively. This suggests that PFOA exerts the least perturbation on the proteome among the four chemicals or is the least toxic among them. This conclusion is consistent with the one arrived at by Randic´ et al. with their D/D matrix method.2 However, we find that clofibrate, PFDA, and PFOA all have similar perturbations, but DEHP stands out as being the most toxic and most contrasting in the complex eigenspectra. Randic´ et al.2 have obtained the result that clofibrate is the most toxic 1140
Journal of Proteome Research • Vol. 5, No. 5, 2006
Figure 7. Complex spectral map and complex principal eigenvector of DEHP on the rat liver cell. The real and imaginary parts are plotted on X- and Y-axis, respectively. Square, smallest principal value; triangle, largest principal value.
on the basis of the D/D matrix approach. However, it is interesting that in both cases PFOA stands out as being the least toxic, and in our approach, the contrast among clofibrate, PFDA, and PFOA is less, whereas DEHP stands out.
4. Conclusions We have developed graph-theoretical complex matrix representations of the relationship of 2D density of gel spots obtained from cell proteome via 2D electrophoresis/chromatography. In this method, a graph is obtained by connecting the gel spots of the proteomics map in the order of their relative abundance and diagonal elements of the graph weighted by a complex variable. The complex weight assigned to each vertex corresponds to its charge for the real part and mass for the imaginary part. In this manner, both charge and mass informa-
Complex Graph Matrix Representations
Figure 8. Complex spectral map and complex principal eigenvector of PFDA on the rat liver cell. The real and imaginary parts are plotted on X- and Y-axis, respectively. Square, smallest principal value; triangle, largest principal value.
tion of the proteins comprising the proteomics map have been considered. We have shown that the eigenspectra of the complex matrix and its principal eigenvector yield important insight into the proteome. The principal eigenvalue and the principal eigenvector seem to provide novel complex descriptors of the proteome. The perturbations caused by four chemicals to the cell, that is, by various peroxisome proliferators, namely, clofibrate, DEHP, PFDA, and PFOA, have been modeled by complex variable proteomics graphs. The complex eigenspectral maps and the map of the principal eigenvector were shown to characterize the perturbations caused by these chemicals. We have used the norm of the principal eigenvalue as a descriptor of the extent of toxicity which seems to be in accord with experiment. On the basis of the norm of the principal eigenvalue, it was shown that PFOA causes the least toxicity or
research articles
Figure 9. Complex spectral map and complex principal eigenvector of PFOA on the rat liver cell. The real and imaginary parts are plotted on X- and Y-axis, respectively. Square, smallest principal value; triangle, largest principal value.
perturbation to the cell, while PFDA, DEHP, and clofibrate cause comparable perturbations, with DEHP being the chemical that causes the greatest perturbation among these. On the basis of the proteomics maps, it was shown that DEHP stands out as having a different eigenspectral map compared to those of the other three chemicals, namely, PFOA, clofibrate, and PFDA, which are mutually similar. The largest and smallest eigenvalues of DEHP show not only the greatest distance but also substantial angular variation compared to other three chemicals. Both the eigenvalues exhibit very little variation along the imaginary components for PFOA, clofiberate, and PFDA, whereas there is a large displacement along the vertical direction or imaginary axis in the case of DEHP. This vectorial feature and variation can only be characterized by a complex representation as we have considered here. Journal of Proteome Research • Vol. 5, No. 5, 2006 1141
research articles While these approaches seem to provide principal eigenvalues and eigenvectors for the proteomics maps and for the perturbations induced by chemicals upon the cell, there is considerable room to generalize these methods. For example, at present, our approach considers only the mass and charge of each gel spot, but there is more information on each spot, such as the amino acid sequence and properties of the protein in each spot. Mathematical characterization of such latent information is far more complex than what we have considered here. Such studies could be topics of future investigations.
Acknowledgment. The research at California State University East Bay was supported by the National Science Foundation under Grant No. CHE-0236434. The work at LLNL was performed in part under the auspices of the U.S. Department of Energy by the University of California, LLNL under contract number W-7405-Eng-48. The work at NRRI was supported by Grant F49620-01-1-0098 from the United States Air Force Office of Scientific Research. The authors extend their thanks to Brian Gute, Natural Resources Research Institute of UMD, Duluth, for insightful comments. References (1) Randic´, M.; N. Lersˇ, N.; Plavsˇic´, D.; Basak, S. C. J. Proteome Res. 2004, 3, 778-785. (2) Randic´, M.; Witzmann, F.; Vracko, M.; Basak, S. C. Med. Chem. Res. 2001, 10, 456-479. (3) Vrae`ko, M.; Basak, S. C. Chemometr. Intell. Lab. Syst. 2004, 70, 33-38. (4) Blackstock, W. P.; Weir, M. P Trends Biotechnol. 1999, 17, 121127. (5) Cutler, P.; Birrell, H.; Haran, M.; Man, W.; Neville, B.; Rosier, S.; Skehel, M.; White, I. Biochem. Soc. Trans. 1999, 27, 555-559. (6) O′Farrell, P. Z.; Goodman, H. M.; O’Farrell, P. H. Cell 1977, 12, 1133-1141. (7) Klose, J.; Kobalz, U. Electrophoresis 1995, 16, 1034-1059. (8) Anderson, N. L.; Taylor, J.; Hofmann, J. P., et al. Toxicol. Pathol. 1996, 24, 72-76. (9) Appel, R. D.; Hochstrasser, D. F. Methods Mol. Biol. 1999, 112, 363-381.
1142
Journal of Proteome Research • Vol. 5, No. 5, 2006
Balasubramanian et al. (10) Guo, X.; Randic´, M.; Basak, S. C. Chem. Phys. Lett. 2001, 350, 106112. (11) Basak, S. C.; Gute, B. D.; Witzman, F. WSEAS Trans. Inf. Sci. Appl. 2005, 2, 996-1001. (12) Vracko, M.; Basak, S. C. Chemometr. Intell. Lab. Syst. 2004, 70, 33-38. (13) Randic´, M.; Novie`, M.; Vrae`ko, M. J. Chem. Inf. Model. 2005, 45, 1205-1213. (14) Randic´, M.; Zupan, J.; Balaban, A. T. Chem. Phys. Lett. 2004, 397, 247-252. (15) Randic´, M.; Vrae`ko, M.; Nandy, A. Basak, S. C. J. Chem. Inf. Comput. Sci. 2000, 40, 1235-1244. (16) Randic´, M.; Razinger, M. On characterization of 3D molecular structure. In From Chemical Topology to Three-Dimensional Geometry; Balaban, A. T., Ed.; Plenum Press: New York, 1977; pp 159-236. (17) Bytautas, L.; Klein, D. J.; Randic´, M.; Pisanski, T. Foldedness in linear polymers: A difference between graphical and Euclidean distances. In Discrete Mathematical Chemistry; Hansen, P., Fowler, P. W., Zheng, M., Eds.; DIMACS Series in Discrete Mathematical and Theoretical Computer Science; American Mathematical Society: Providence, RI, 2000; pp 51, 39-61. (18) Witzman, F. Molecular Anatomy Laboratory, Department of Biology, Indiana University and Purdue University, Columbus, IN 47203. (19) Randic´, M.; Krilov, G. Int. J. Quantum Chem. 1999, 75, 10171026. (20) Randic´, M. J. Chem. Inf. Comput. Sci. 1995, 35, 373-382. (21) Anderson, N. L. Two-Dimensional Electrophoresis: Operation of the ISO-DALT System; Large Scale Biology Press: Washington, DC, 1991. (22) Neuhoff, V.; Arold, N.; Taube, D.; Ehrhardt, W. Electrophoresis 1988, 9, 255-262. (23) Anderson, N. L.; Giere, F. A.; Nance, S. L.; Gemmell, M. A.; Tollaksen, S. L.; Anderson, N. G. Fundam. Appl. Toxicol. 1987, 8, 39-50. (24) Balasubramanian, K. Theor. Chim. Acta 1984, 65, 49-58. (25) Balasubramanian, K. J. Comput. Chem. 1984, 5, 387-394. (26) Balasubramanian, K. J. Comput. Chem. 1988, 9, 204-211. (27) Balasubramanian, K. J. Comput. Chem. 1990, 11, 828-836. (28) Balasubramanian, K. Chem. Phys. Lett. 1995, 232, 415-423. (29) Hawkins, D. M.; Basak, S. C.; Karaker, J.; Geiss, K. T.; Witzmann, F. A. J. Chem. Inf. Model. 2006, 46, 9-16. (30) Bajzer, Z.; M. Randic´, M.; D. Plavsic, D.; Basak, S. C. J. Mol. Graphics Modell. 2003, 22, 1-9.
PR050445S