Canonical Labeling of Proteome Maps - American Chemical Society

Mar 3, 2005 - 21000 Split, Croatia, and Natural Resources Research Institute, University of Minnesota at Duluth,. 5013 Miller Trunk Highway, Duluth, ...
0 downloads 0 Views 129KB Size
Canonical Labeling of Proteome Maps Milan Randic´ ,*,†,| Nella Lersˇ ,‡ Damir Vukicˇ evic´ ,§ Dejan Plavsˇ ic´ ,‡ Brian D. Gute,| and Subhash C. Basak| National Institute of Chemistry, P.O. Box 3430, 1001 Ljubljana, Slovenia, The Rud-er Bosˇkovic´ Institute, P.O. Box 180, HR-10002 Zagreb, Croatia, Department of Mathematics, University of Split, Nikole Tesle 12, 21000 Split, Croatia, and Natural Resources Research Institute, University of Minnesota at Duluth, 5013 Miller Trunk Highway, Duluth, Minnesota 55811 Received March 3, 2005

We propose a canonical labeling of proteome maps, which enables one to sort and catalog the maps in a simple way. The canonical label of a proteome map is based on the canonical labeling of vertexes of Hasse diagram embedded in the map resulting in the adjacency matrix, the rows of which when viewed as binary numbers are the smallest possible such numbers. The use of the approach in documentation is illustrated with the proteome maps of liver cells of healthy male Fisher F344 rats and the rats treated with different peroxisome proliferators. Keywords: proteome maps • canonical labeling • Hasse diagram • documentation • peroxisome proliferators

Introduction Proteomics is the study of the complete expression profile of proteins of a given cell type.1 It relies heavily on 2-D gel electrophoresis being the only technique permitting simultaneous separation of up to around 104 protein components to form a proteome map. Proteome maps can be characterized by means of map biodescriptors being map invariants encoding information of biological and chemical interest.2-10 Advantages of such a characterization of proteome maps are obvious: comparison of the maps, instead of being visual, qualitative, and slow, becomes quantitative, computerized, and fast. Equally important is to develop methods for numerical representations of proteome maps, which would enable one to sort and catalog the maps in a simple way. By representation we mean associating with the map a set of labels (codes), preferably those that can be directly manipulated by a computer. In this article, we put forward a numerical representation of a proteome map based on the construction of a unique graph (Hasse diagram) embedded in the map and the canonical labeling of its vertexes resulting in the adjacency matrix the rows of which when viewed as binary numbers give the smallest possible such numbers. In this way, one can generate the canonical labels of proteins of a proteome map as well as a canonical binary label of the proteome map as a whole, which enables one to sort, catalog, and compare proteome maps in a simple way. We will illustrate the use of the approach in documentation with the proteome maps of liver cells of healthy male Fisher F344 rats and the rats treated with different peroxisome proliferators. * To whom correspondence should [email protected]. † National Institute of Chemistry. ‡ The Rud -er Bosˇkovic´ Institute. § University of Split. | University of Minnesota at Duluth. 10.1021/pr050049+ CCC: $30.25

be

addressed.

 2005 American Chemical Society

E-mail:

Partial Ordering on the Set of Proteins of a Proteome Map. The Cartesian product of two sets A and B, denoted by A × B, is the set of all ordered pairs (a, b) where a ∈ A and b ∈ B. A binary relation R on two sets A and B is a subset of A × B. If (a, b) ∈ R, then we write aRb. When we say that R is a binary relation on a set A, we mean that R is a subset of A × A. A binary relation R ∪ A × A is called a partial ordering or partial order if R is reflexive (if aRa for all a ∈ S), antisymmetric (if aRb and bRa imply a ) b), and transitive (if aRb and bRc imply aRc).11,12 A set A together with a partial ordering R is called a poset (partially ordered set) and is denoted by (A, R). A finite (A, R) is conventionally represented by a Hasse diagram, in which the elements of the set A are represented by small circles and the relation aRb is represented by an ascending line from a to b unless aRb is already implied by transitivity. Cartesian coordinates x and y of a protein spot on a proteome map reflect respectively net charge, c, and mass, m, of the protein making up the spot. Hence, to rank the proteins of a proteome map with respect to both net charge and mass, we will associate a pair (x, y) with each of the proteins, and then will construct the poset (P, g), where P is the set of all the (x, y) pairs, and g is the “greater than or equal to” relation. Two proteins I and J of the map are called comparable if either (xI, yI) > (xJ, yJ) or (xJ, yJ) > (xI, yI). We also say that protein I dominates protein J if (xI, yI) > (xJ, yJ), which means that either xI > xJ & yI g yJ or xI g xJ & yI > yJ. Clearly, (xI, yI) ) (xJ, yJ) if and only if xI ) xJ & yI ) yJ . If for I and J holds xI > xJ & yJ > yI or xJ > xI & yI > yJ, then I and J are called incomparable. Figure 1 shows a hypothetical proteome map containing 20 protein spots labeled with the first 20 letters of the alphabet. The spots at the top of the map belong to the proteins having large masses, and those at the bottom originate from the proteins having smaller masses. The proteins with higher net charges are located on the right side of the map, while those Journal of Proteome Research 2005, 4, 1347-1352

1347

Published on Web 06/22/2005

research articles

Figure 1. Schematic representation of a hypothetical proteome map containing 20 protein spots labeled with the first 20 letters of the alphabet.

Figure 2. Canonically labeled Hasse diagram embedded in the proteome map shown in Figure 1. Numbers and letters at each site represent the labels of the vertexes and the labels of the corresponding protein spots, respectively.

on the left side have lower net charges. We have constructed the poset (P ) {(xi, yi) | i ) A, B, ..., T}, g ) associated with the map and its graphical representation the Hasse diagram embedded in the map shown in Figure 2. Canonical Labeling of the Vertexes of Embedded Hasse Diagram and Proteome Maps. The critical step toward a canonical representation of a proteome map is the selection of a mathematical object that will serve as the basis of the representation and which should be not only well-defined but also not too demanding for construction. The object suggesting itself for that purpose is a graph in particular in the map embedded Hasse diagram of (P, g) associated with the map, the vertexes of which are canonically labeled. One of the earliest schemes for canonical labeling of vertexes in a graph was introduced by Morgan.13 Morgan’s algorithm is based on the notion of extended connectivity of a vertex in a graph and it may lead to oscillatory behavior and occasionally may not yield the answer.14-17 Therefore, we have opted for another approach based on the canonical labeling of vertexes of a graph that results in the canonical adjacency matrix the rows of which when reading from the left to the right and from the top to the bottom thought as binary numbers are the smallest binary numbers possible.18-20 In Figure 2 the approach is illustrated with the embedded Hasse diagram of the poset (P ) {(xi, yi) | i ) A, B, ..., T}, g) associated with the proteome map pictured in Figure 1. Observe that the Hasse diagram has vertexes of different degree, which facilitates the search for canonical labels 1348

Journal of Proteome Research • Vol. 4, No. 4, 2005

Randic´ et al.

of the vertexes by allowing one to focus attention first on the vertexes of the lowest degree, because one of these vertexes will produce in the first row the smallest binary number. The only terminal vertexes in the Hasse diagram are those associated with protein spots Q and T. These vertexes have to be labeled with 1 and 2, and the vertexes adjacent to them with 20 and 19 respectively, in order that the first and the second row in the accompanying adjacency matrix read 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 and 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0, respectively. Labels 1, 2, 19, and 20 are tentatively assigned to the vertexes associated with spots Q, T, L, and P respectively. Now we will search for the location of the next smallest label, label 3. The vertex labeled with 3 has to have as few unlabeled nearest neighbors as possible and they have to be labeled with the greatest of the remaining unused labels. These two requirements ensure that the third row in the adjacency matrix is the smallest binary number among the binary numbers that can result from labeling these vertexes with unused labels. The only vertexes with two unlabeled nearest neighbors are the vertexes associated with protein spots D and S. If we label the vertex corresponding to spot D with 3, then the vertex associated with spot A has the nearest neighborhood in which only one vertex needs assignment of a label. In the case that the vertex associated with spot S is labeled with 3 then all of the unlabeled vertexes in the diagram have the nearest neighborhoods in which two or more vertexes need labels. Clearly, we have to label the vertexes corresponding to protein spots D, A, and B with 3, 4, and 16 respectively, and which of the vertexes associated with spots E and G takes label 17 and which label 18 will be decided later. Now the vertex associated with spot C is the only unlabeled vertex the nearest neighborhood of which contains just one unlabeled vertex corresponding to spot F, and hence we label these vertexes with 5 and 15, respectively. Observe that the vertex associated with spot P is adjacent to the vertex labeled with 5 and therefore its labeling with 20 was correct. As the nearest neighborhood of the vertex corresponding to spot S consists of only two vertexes both unlabeled and the vertex associated with spot K has three adjacent vertexes two of which are unlabeled, we label the former vertex with 6 and the latter one with 7. The vertexes associated with spots M, N, and G have to be labeled with 12, 14, and 18 respectively, in order that the sixth and the seventh row in the adjacency matrix be as small binary numbers as possible. Consequently, vertexes corresponding to protein spots I and E take labels 13 and 17, respectively. Now the vertex associated with spot R has in its nearest neighborhood just one unlabeled vertex corresponding to spot H. Hence, these two vertexes are labeled with 8 and 11, respectively. Only two unlabeled vertexes remain in the Hasse diagram, the vertexes associated with protein spots J and O. It is not difficult to see that if one assigns label 9 to the vertex associated with spot O that corresponds to a smaller binary number than the opposite assignment. The assigning of label 10 to the vertex corresponding to spot J completes the canonical labeling of the vertexes of the Hasse diagram. Note that the considered Hasse diagram possesses only identity automorphism and therefore only the described labeling of its vertexes leads to its canonical adjacency matrix. A permutation P acting on the set of vertexes of a graph G is an automorphism of G if it is adjacency preserving. If P is presented by a permutation matrix P, then P is an automorphism if and only if P-1AP ) A, where A is the adjacency matrix of G. In general, the Hasse diagram associated with a proteome map can have more than one automorphism. If this is the case for a given

research articles

Canonical Labeling of Proteome Maps Table 1. Canonical Adjacency Matrix of the Embedded Hasse Diagram Shown in Figure 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1

0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0

0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0

0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 1

0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1 0 0

0 0 0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 1 1

0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 1 0 0

0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1

0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0

0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

0 0 1 1 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0

0 0 1 1 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0

1 0 0 0 1 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0

embedded Hasse diagram, then more than one labeling of its vertexes lead to its canonical adjacency matrix. Among these labelings we select as the canonical labeling the one in which labels of mutually equivalent vertexes are ordered in accordance with the lexicographic ordering of (x, y) pairs associated with these vertexes. Given two sequences (a1, a2, a3, ..., ap) and (b1, b2, b3, ..., bq), where each ai and each bj is in some ordered set of characters, we say that the former sequence is lexicographically less than the latter one if either there exists an integer j, 0 e j e min(p,q), such that ai ) bi for all i ) 1, 2, 3, ..., j - 1 and aj < bj, or p < q and ai ) bi for all i ) 1, 2, ..., p. Table 1 shows the canonical adjacency matrix of the Hasse diagram pictured in Figure 2. We will employ this matrix in the labeling of the proteome map shown in Figure 1. To wit, the map can be labeled (coded) with the canonical sequence of 20 binary numbers, each of which represents one row of the canonical adjacency matrix. If these binary numbers are transformed into the decimal notation, then the sequence reads: 1, 2, 12, 28, 49, 192, 324, 832, 1376, 2689, 5148, ..., 558400. The question poses itself: How the outlined canonical labeling of proteome maps depends on the number of considered protein spots? It is not difficult to recognize that the canonical labeling of vertexes of an embedded Hasse diagram is affected by adding vertexes as well as by their deleting. Owing to that, we must in advance select the number of proteine spots, n, that will be used for the construction of canonical labels of the maps. For instance, one can start with n ) 20, that is one can select the 20 most intensive protein spots on the map, as has been assumed in our hypothetical proteome map shown in Figure 1. If this is found somewhat restrictive one can increase n, for example to 50, 100, or 250. As has been demonstrated in ref 5, an increase in the number of considered protein spots from 20 to 70 was not accompanied with a drastic increase in the complexity of the corresponding embedded Hasse diagram. This need not be surprising, because a Hasse diagram is sensitive to local features of a map, and locally proteome map having 20, 200, or 2000 spots are similar. However, as the number of considered protein spots increases the finding of the canonical labels of vertexes in the corresponding embedded Hasse diagrams becomes more and more complex and demands the use of a well-defined computational procedure as well as a computer.

As the process of arriving at canonical labels of proteome maps does not include information on protein abundances, proteome maps containing the same proteins have the same canonical label. This makes it possible for one to immediately group all the proteome maps of a given cell type of a given species into the same class, under the same canonical label. If one wants to differentiate between individual proteome maps within a class, then it is necessary to incorporate information on protein abundances, because the abundance is a parameter that varies when the proteome alters. A way to include the abundance is to construct for a proteome map an n-component vector the components of which are protein abundances ordered according to the already established canonical labels of the vertexes of Hasse diagram embedded in the map. We call such a vector a canonical abundance vector. Algorithm for the Canonical Labeling of Vertexes of Hasse Diagram Embedded in a Proteome Map. The trivial algorithm for the canonical labeling of vertexes of Hasse diagram embedded in a proteome map is the algorithm that checks all the possible permutations P′ of the labels (in the lexicographical order) and finds the optimal permutation P. Recall that permutation of order n is a bijective function from the set {1, ..., n} onto the set {1, ..., n} and that p(i) is the value to which i is mapped by the permutation p. For a Hasse diagram with n vertexes this algorithm produces n!() 1‚2‚3 ...n) adjacency matrixes (P′(1)can be chosen in n ways, P′(2) in n - 1 ways, P′(3) in n - 2 ways and so on). The first permutation produced by this algorithm is stored in the array P. After that each of the n! - 1 remaining permutations is successively compared to the one stored in P. If it is found that the considered new permutation is “better” than the one in P, then the new permutation is stored in P. Finally, P contains the canonical (optimal) permutation. Clearly, the trivial algorithm is very inefficient and hence we need to use a different approach. We propose an algorithm in which one will also successively choose values of P′(1), P′(2), P′(3), ..., but in much more restrictive ways. Suppose that values of P′(1), P′(2), ..., P′(k), are already chosen. Our aim is to (restrictively) determine the possible candidates for the value of P′(k + 1) and proceed in the similar manner. As in the case of the trivial algorithm, the first permutation produced by this algorithm is stored in the array P. After that each of the remaining permutations produced by the algorithm is compared to the one stored in P and if it is found that the considered new permutation is “better”, then it is stored in P. At the end P contains the canonical (optimal) permutation. More formally, we use the following recursive algorithm (presented in the pseudo-code): rec (k) (1) If x ) n do (1.1) If this is the first produced permutation then P ) P′ (1.2) Else if P′ produces “better” labeling then put P ) P′ (2) Determine the set Cand of candidates for P′(k + 1) (3) For each c ∈ Cand then put P′(k + 1) ) c and call rec(k + 1) (elements are successively chosen in the ascending order) The most important line of this algorithm is the line 2) in which one needs to determine the setCand. It is determined in the four steps: Step 1: To each vertex v ∈ V0, where V0 denotes the set of vertexes that are not already stored in the permutation, we assign the vector av ) (i1, ..., ik) such that iq ) 1 if v and P(q) are adjacent and 0 otherwise. Journal of Proteome Research • Vol. 4, No. 4, 2005 1349

research articles

Randic´ et al. Table 2. Canonical Labels and Scaled Coordinates (x, y) of the 29 Most Intensive Protein Spots in the Coomassie Brilliant Blue Stained 2-D Gel Electrophoresis Pattern of the Proteome from Liver Cells of Healthy Male Fisher F344 Rats and the Abundances, Acontrol, of the Proteins Making up These Spots As Well As the Relative Abundances, RAi (i ) PFOA, PFDA, Clofibtate, DEHP), of These Proteins in the Proteomes from Liver Cells of the Rats Treated with PFOA, PFDA, Clofibtate, and DEHP

Figure 3. “Fox trail” diagram of the proteome map shown in Figure 1.

Step 2: Vectors av, v ∈ V0 are sorted according to their lexicographical order and ranked (the vertexes corresponding to the smallest vectors have rank 1, those immediately after them have rank 2 and so on). The maximal rank assigned in this way is denoted by mr. Step 3: To each vertex v ∈ V0, of rank 1, we assign a vector bv, with mr entries in such a way that the i-th component of the vector bv is the number of neighbors of v that have rank i. Step 4: From the set of vectors with rank 1, those that are minimal (according to the lexicographical order) are extracted and the vertexes corresponding to them form the set Cand. The part of the algorithm given in the line 3 ensures that among all the permutations resulting in the same canonical adjacency matrix, the one being the smallest regarding lexicographical order is chosen. Let us compare the efficiency of the proposed algorithm and the trivial one. In the case of the map shown in Figure 2 the trivial algorithm checks 20! ≈ 2.43 × 1018 matrixes while the proposed algorithm checks only 4 matrixes and finds the canonical (optimal) permutation. “Fox Trail” Diagram of a Proteome Map. The outlined canonical labeling of the vertexes (proteins) can be combined with the scheme using a zigzag like curve for characterization of a proteome map and construction of the corresponding D/D matrix and a set of map invariants extracted from it.2-4,7,21-24 To wit, instead of using protein abundance as a “tool” for constructing a zigzag curve associated with the map, as is the case in the already established approaches, one can use the canonical labels of vertexes of embedded Hasse diagram and construct a zigzag curve by connecting the vertexes having consecutive numerical labels. Figure 3 shows such a curve constructed for the hypothetical proteome map pictured in Figure 1. As one can see the curve crosses itself many times and resembles very much fox trail, experimentally observed spots of a fox at fixed time intervals.25,26 Therefore, we name a zigzag curve founded upon the canonical labels of vertexes of embedded Hasse diagram a “fox trail” diagram. An advantage of the “fox trail” diagram is that it does not utilize information on the abundance, except in selecting the initial n spots, if not selected otherwise. This may in some circumstances allow the construction of “fox trail” diagrams of proteome maps of unknown origin if the set of the proteins forming a basis for analysis is identified or if the list of coordinates of the “standard” proteins for comparative studies is composed. 1350

Journal of Proteome Research • Vol. 4, No. 4, 2005

canonical label

x

y

Acontrol

RAPFOA

RAPFDA RAclofibrate RADEHP

27 17 10 1 2 19 14 23 12 22 26 13 24 16 28 15 21 4 18 29 3 9 7 8 25 5 6 20 11

0.7581 1.3974 0.9784 0.8619 0.9006 0.9439 1.3245 0.4112 1.3017 0.6747 0.7776 1.3413 0.6543 1.3258 0.8810 1.0396 1.1799 0.7617 0.9385 0.8475 0.8189 0.9180 0.7492 0.7295 0.7475 0.7699 0.6597 0.7189 1.2132

0.6145 0.5948 0.5286 0.8661 0.7160 0.4259 0.5271 0.4289 0.5781 0.5535 0.3970 0.4358 0.2499 0.5953 0.6354 0.4103 0.5673 0.3936 0.5258 0.6359 0.6286 0.4241 0.5521 0.6137 0.3917 0.3173 0.5528 0.5526 0.5783

136653 127195 114929 112251 98224 90004 84842 82492 80015 72173 64684 58977 58001 55402 49027 48976 48145 42773 40923 36433 35896 31194 30510 29296 26155 25389 24006 22344 20142

0.8332 0.7796 1.6744 0.5214 0.9280 1.4370 0.8700 0.8967 0.9662 1.0805 0.9819 2.4224 0.9749 0.8329 0.8670 1.6631 0.8389 1.1466 1.4749 0.8410 0.8833 1.3537 0.9748 1.0263 0.9628 0.8985 1.1912 1.4279 0.6973

1.0995 0.5745 1.9279 0.3467 0.8446 1.2484 0.5361 0.9027 1.0007 0.8365 0.5886 0.7838 0.9219 0.5984 1.0634 2.7300 0.5016 1.8036 2.2973 0.8676 0.6073 1.3300 1.0090 1.3494 1.5907 0.6803 1.5306 0.8306 0.6795

1.1975 0.6026 1.4451 0.6517 0.8572 1.2517 0.8476 1.0268 0.9502 0.6486 0.9023 0.8245 1.0383 1.0728 0.9394 1.3186 0.9884 1.0756 1.9544 0.9267 0.8086 1.3637 0.8673 1.3382 0.8044 0.8041 2.0926 0.7795 0.7979

0.0594 0.8813 1.5713 0.6866 0.9462 1.3266 1.1485 1.0734 1.2602 1.0824 1.1712 2.4859 1.2354 1.2460 1.4118 1.3471 1.0873 1.3869 0.0938 1.2267 0.9181 2.2165 1.2393 0.9409 0.8859 1.2152 1.9225 0.8687 0.8477

Illustration of the Use of the Approach in Documentation. We will illustrate the use of the outlined approach in documentation with the proteome maps of liver cells of healthy male Fisher F344 rats (control group) and the rats treated with the following peroxisome proliferators: perfluorooctanoic acid (PFOA), perfluorodecanoic acid (PFDA), 2-(4-chlorophenoxy)-

Figure 4. Schematic representation of the simplified proteome maps of liver cells of the healthy male Fisher F344 rats and the rats treated with PFOA, PFDA, clofibrate, and DEHP showing the positions of the spots of only 29 proteins listed in Table 2.

research articles

Canonical Labeling of Proteome Maps

Figure 5. Canonically labeled Hasse diagram embedded in the schematic representation of the simplified proteome maps of liver cells of the healthy male Fisher F344 rats and the rats treated with PFOA, PFDA, clofibrate, and DEHP. Table 3. Part of the Canonical Adjacency Matrix of the Embedded Hasse Diagram Shown in Figure 5 row\ column 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

1 2 3 4 5 6 7 8 9 10 11 12 13 14

0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 1 1

0 0 0 0 0 0 0 0 0 0 1 1 0 1

0 0 0 0 0 0 0 0 0 0 1 1 1 1

0 0 0 0 0 0 0 0 1 1 0 0 0 1

0 0 0 0 0 0 0 0 1 1 0 0 1 0

0 0 0 0 0 0 0 1 0 0 0 0 0 0

0 0 0 0 0 0 1 0 0 1 1 1 0 0

0 0 0 0 0 1 0 1 0 0 0 0 0 0

0 0 0 0 0 1 1 0 0 0 0 0 1 0

0 0 0 0 1 0 0 0 0 0 0 0 0 0

0 0 0 1 0 0 0 0 0 0 0 0 0 0

0 0 0 1 1 0 0 0 1 0 0 0 0 0

0 0 1 0 0 0 1 1 0 0 0 0 0 0

0 1 1 0 0 0 0 0 0 0 0 0 0 0

1 1 1 0 0 0 0 0 0 0 0 0 0 0

2-methylpropanoic acid ethyl ester (clofibrate), and di(ethylhexyl)phthalate (DEHP). Experimental details are given in ref 4. The first step is the selection of proteins that will be considered. The set of 29 proteins making up the most intensive spots in the Coomassie brilliant blue stained 2-D gel electrophoresis pattern of the proteome from liver cells of the healthy rats has been selected and it will be considered in all the five proteome maps. Table 2 lists scaled (x, y) coordinates of these 29 protein spots. The scaled coordinates are obtained by dividing the original coordinates of the spots by the maximal Euclidean distance between two spots on the map. We use dimensionless quantities, for instance, quotients of distances rather than distances themselves, to avoid the influence of gel

size, staining method, etc. In Figure 4 we present a schematic representation of the simplified proteome maps of liver cells of the healthy and treated rats showing the positions of only 29 protein spots listed in Table 2. The next step is the construction of the poset (P ) {(xi, yi) | i ) 1, 2, ..., 29}, g) and its graphical representation the Hasse diagram embedded in the schematic representation of the maps. Afterward, we determine the canonical labels of the vertexes of the Hasee diagram using the proposed algorithm. We had to check only 72 matrixes instead of 29! ≈ 8.84 × 1030 matrixes when the trivial algorithm is used. Figure 5 shows the canonically labeled embedded Hasse diagram associated with the “control”, “PFOA”, “PFDA”, “clofibrate”, and “DEHP” proteome map of liver cells of male Fisher F344 rats. Note that the Hasse diagram possesses 4 automorphisms forming the automorphism group in terms of which the symmetries (with respect to permutations of the vertexes) of the Hasse diagram are expressed. The first dozen rows of the canonical adjacency matrix of the “control”, “PFOA”, “PFDA”, “clofibrate”, and “DEHP” proteome map are shown in Table 3. As these five proteome maps have the same canonical adjacency matrix, they must also have the same label, the canonical sequence of binary numbers which in decimal notation reads: 1, 3, 7, 24, 40, 192, 324, 644, 3080, 3328, 12 544, 12 544, 21 568, 30 720, ..., 469 762 048. The occurrence of repetition of a number in the sequence, like 12 544 in the above case, indicates the presence of symmetry equivalent vertexes in the Hasse diagram. The common label of all the five proteome maps is also a class label. To determine the ordering of two classes of proteome maps first we consider the first terms of their canonical sequences and order the classes accordingly. If the first terms are the same, then we compare the next terms in the sequences, and continues until we come upon two entries that are different, which then determines the ordering of the classes. In Table 2, we list for the 29 considered proteins their abundances in the proteome from liver cells of the healthy male Fisher F344 rats and their relative abundances in the proteomes from liver cells of the rats treated with PFOA, PFDA, clofibrate, and DEHP. The relative abundance of a protein in a proteome, following Anderson and collaborators27 is calculated by dividing its abundance in the proteome by its abundance in the corresponding “control” proteome (the proteome from the healthy unperturbed cell). In Table 4 we list in the first row the canonical abundance vector associated with the “control” proteome map, and in the remaining rows the canonical abundance vectors associated with the “PFOA”, “PFDA”, “clofibrate”, and “DEHP” proteome map. The components of the canonical abundance vector associated with the “control” proteome map (the leading map) are the numerical values of protein abundances explicitly given, whereas the components of the canonical abundance vectors associated with the remaining four maps are the corresponding relative protein

Table 4. Canonical Abundance Vectors Associated with the “Control”, “PFOA”, “PFDA”, “Clofibrate”, and “DEHP” Proteome Map of Liver Cells of Male Fisher F344 Rats proteome map

canonical abundance vector

control PFOA PFDA clofibtate DEHP

(112251, 98224, 35896, 42773, 25389, 24006, 30510, 29296, 31194, ..., 36433) (0.5214, 0.9280, 0.8833, 1.1466, 0.8985, 1.1912, 0.9748, 1.0263, 1.3537, ..., 0.8410) (0.3467, 0.8446, 0.6073, 1.8036, 0.6803, 1.5306, 1.0090, 1.3494, 1.3300, ..., 0.8676) (0.6517, 0.8572, 0.8086, 1.0756, 0.8041, 2.0926, 0.8673, 1.3382, 1.3637, ..., 0.9267) (0.6866, 0.9462, 0.9181, 1.3869, 1.2152, 1.9225, 1.2393, 0.9409, 2.2165, ..., 1.2267) Journal of Proteome Research • Vol. 4, No. 4, 2005 1351

research articles abundances. By this convention one can immediately distinguish the leading proteome map from the other maps of the same class.

Concluding Remarks In this article, we have outlined a method for numerical representation of proteome maps. The method makes it possible for one to immediately group all the proteome maps of a given cell type of a given species into the same class, under the same canonical label. The canonical label of a proteome map is easily and quickly constructed using the proposed algorithm for the canonical labeling of vertexes of Hasse diagram embedded in a proteome map. The differentiation of individual proteome maps within a class is made by means of the canonical abundance vector associated with a proteome map. We feel that the researchers in the field of proteomics will adopt the method owing to its simplicity and efficacy.

Acknowledgment. This work was supported in part by the Ministry of Science, Education and Sports of the Republic of Croatia and Croatian-Slovenian project “Application of Methods of Discrete Mathematics in Chemistry and Biology”. This manuscript is contribution no. 381 from the Center for Water and the Environment of the Natural Resources Research Institute. This material is based in part on research sponsored by the Air Force Research Laboratory, under agreement no. F49620-02-1-0138. References (1) Oxford Dictionary of Biochemistry and Molecular Biology; Oxford University Press: Oxford, 2003. (2) Randic´, M. On Graphical and Numerical Characterization of Proteomics Maps. J. Chem. Inf. Comput. Sci. 2001, 41, 1330-1338. (3) Randic´, M.; Zupan, J.; Novic´, M. On 3-D Graphical Representation of Proteomics Maps and Their Numerical Characterization. J. Chem. Inf. Comput. Sci. 2001, 41, 1339-1344. (4) Randic´, M.; Witzmann, F.; Vracˇko, M.; Basak, S. C. On Characterization of Proteomics Maps and Chemically Induced Changes in Proteomes Using Matrix Invariants: Application to Peroxisome Proliferators. Med. Chem. Res. 2001, 10, 456-479. (5) Randic´, M. A. Graph Theoretical Characterization of Proteomics Maps. Int. J. Quantum Chem. 2002, 90, 848-858. (6) Randic´, M.; Basak, S. C. A Comparative Study of Proteomics Maps Using Graph Theoretical Descriptors. J. Chem. Inf. Comput. Sci. 2002, 42, 983-992.

1352

Journal of Proteome Research • Vol. 4, No. 4, 2005

Randic´ et al. (7) Randic´, M.; Novicˇ, M.; Vracˇko, M. On Characterization of Dose Variations of 2-D Proteomics Maps by Matrix Invariants. J. Proteome Res. 2002, 1, 217-226. (8) Bajzer, Zˇ .; Randic´, M.; Plavsˇic´, D.; Basak, S. C. Novel Map Descriptor for Characterization of Toxic Effects in Proteomics Maps. J. Mol. Graphics Modell. 2003, 22, 1-9. (9) Randic´, M.; Lersˇ, N.; Plavsˇic´, D.; Basak, S. C. Characterization of 2-D Proteome Maps Based on Nearest Neighborhoods of Spots. Croat. Chem. Acta 2004, 77, 345-351. (10) Randic´, M.; Lersˇ, N.; Plavsˇic´, D.; Basak, S. C. On Invariants of a 2-D Proteome Map Derived from Neighborhood Graphs. J. Proteome Res. 2004, 3, 778-785. (11) Rosen, K. H. Discrete Mathematics and its Applications, 5th ed.; McGraw-Hill: Boston, 2003. (12) Biggs, N. L. Discrete Mathematics, 2nd ed.; Oxford University Press: Oxford, 2003. (13) Morgan, L. The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstract Services. J. Chem. Doc. 1965, 5, 107-113. (14) Randic´, M. On Unique Numbering of Atoms and Unique Codes for Molecular Graphs. J. Chem. Inf. Comput. Sci. 1975, 15, 105108. (15) Ru ¨cker, Ch.; Ru ¨cker, G. Mathematical Relation between Extended Connectivity and Eigenvector Cofficients. J. Chem. Inf. Comput. Sci. 1994, 34, 534-538. (16) Randic´, M.; Plavsˇic´, D. On the Concept of Molecular Complexity. Croat. Chem. Acta 2002, 75, 107-116. (17) Randic´, M.; Plavsˇic´, D. On Characterization of Molecular Complexity. Int. J. Quantum Chem. 2003, 91, 20-31. (18) Randic´, M. On the Recognition of Identical Graphs Representing Molecular Topology. J. Chem. Phys. 1974, 60, 3920-3928. (19) Randic´, M. On Rearrangement of the Connectivity Matrix of a Graph. J. Chem. Phys. 1975, 62, 309-310. (20) Randic´, M. Systematic Study of Symmetry Properties of Graphs. I. Petersen Graph. Croat. Chem. Acta 1977, 49, 643-655. (21) Randic´, M.; Kleiner, F. A.; DeAlba, L. M. Distance/Distance matrixes. J. Chem. Inf. Comput. Sci. 1994, 34, 277-286. (22) Randic´, M. Krilov, G. On Characterization of the Folding of Proteins. Int. J. Quantum Chem. 1999, 75, 1017-1026. (23) Randic´, M.; Vracˇko, M.; Lersˇ, N.; Plavsˇic´, D. Novel 2-D Graphical Representation of DNA Sequences and Their Numerical Characterization. Chem. Phys. Lett. 2003, 368, 1-6. (24) Randic´, M.; Vracˇko, M.; Lersˇ, N.; Plavsˇic´, D. Analysis of Similarity/ Dissimilarity of DNA Sequences Based on Novel 2-D Graphical Representation. Chem. Phys. Lett. 2003, 371, 202-207. (25) Sniff, D. H.; Jesson, C. R. Simulation Model of Animal Movement Petterns. Adv. Ecol. Res. 1969, 6, 185-220. (26) Hall, G. G. Modelling - A Phylosophy for Applied Mathematics. Bull. Inst. Math. Appl. 1972, 8, 226-228. (27) Anderson, N. L.; Esquer-Blasco, R.; Richardson, F.; Foxworthy, P.; Eacho, P. The Effects of Peroxisome Proliferators on Protein Adundances in Mouse Liver. Toxicol. Appl. Pharmacol. 1996, 137, 75-89.

PR050049+